Merge branch 'parisc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- parisc now uses the generic dma_noncoherent_ops implementation
(Christoph Hellwig)
- further memory barrier and spinlock improvements (John David Anglin)
- prepare removal of current_text_addr() functions (Nick Desaulniers)
- improve kernel stack unwinding on parisc (me)
- drop ENOTSUP which was defined on parisc only (me)
* 'parisc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix and improve kernel stack unwinding
parisc: Remove unnecessary barriers from spinlock.h
parisc: Remove ordered stores from syscall.S
parisc: prefer _THIS_IP_ and _RET_IP_ statement expressions
parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature
parisc: Drop architecture-specific ENOTSUP define
parisc: use generic dma_noncoherent_ops
parisc: always use flush_kernel_dcache_range for DMA cache maintainance
parisc: merge pcx_dma_ops and pcxl_dma_ops
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
index 6c06e10..f5120a0 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
@@ -380,31 +380,26 @@
as follows:
<pre>
- 1 unsigned long gpnum;
- 2 unsigned long completed;
+ 1 unsigned long gp_seq;
</pre>
<p>RCU grace periods are numbered, and
-the <tt>->gpnum</tt> field contains the number of the grace
-period that started most recently.
-The <tt>->completed</tt> field contains the number of the
-grace period that completed most recently.
-If the two fields are equal, the RCU grace period that most recently
-started has already completed, and therefore the corresponding
-flavor of RCU is idle.
-If <tt>->gpnum</tt> is one greater than <tt>->completed</tt>,
-then <tt>->gpnum</tt> gives the number of the current RCU
-grace period, which has not yet completed.
-Any other combination of values indicates that something is broken.
-These two fields are protected by the root <tt>rcu_node</tt>'s
+the <tt>->gp_seq</tt> field contains the current grace-period
+sequence number.
+The bottom two bits are the state of the current grace period,
+which can be zero for not yet started or one for in progress.
+In other words, if the bottom two bits of <tt>->gp_seq</tt> are
+zero, the corresponding flavor of RCU is idle.
+Any other value in the bottom two bits indicates that something is broken.
+This field is protected by the root <tt>rcu_node</tt> structure's
<tt>->lock</tt> field.
-</p><p>There are <tt>->gpnum</tt> and <tt>->completed</tt> fields
+</p><p>There are <tt>->gp_seq</tt> fields
in the <tt>rcu_node</tt> and <tt>rcu_data</tt> structures
as well.
The fields in the <tt>rcu_state</tt> structure represent the
-most current values, and those of the other structures are compared
-in order to detect the start of a new grace period in a distributed
+most current value, and those of the other structures are compared
+in order to detect the beginnings and ends of grace periods in a distributed
fashion.
The values flow from <tt>rcu_state</tt> to <tt>rcu_node</tt>
(down the tree from the root to the leaves) to <tt>rcu_data</tt>.
@@ -512,27 +507,47 @@
as follows:
<pre>
- 1 unsigned long gpnum;
- 2 unsigned long completed;
+ 1 unsigned long gp_seq;
+ 2 unsigned long gp_seq_needed;
</pre>
-<p>These fields are the counterparts of the fields of the same name in
-the <tt>rcu_state</tt> structure.
-They each may lag up to one behind their <tt>rcu_state</tt>
-counterparts.
-If a given <tt>rcu_node</tt> structure's <tt>->gpnum</tt> and
-<tt>->complete</tt> fields are equal, then this <tt>rcu_node</tt>
+<p>The <tt>rcu_node</tt> structures' <tt>->gp_seq</tt> fields are
+the counterparts of the field of the same name in the <tt>rcu_state</tt>
+structure.
+They each may lag up to one step behind their <tt>rcu_state</tt>
+counterpart.
+If the bottom two bits of a given <tt>rcu_node</tt> structure's
+<tt>->gp_seq</tt> field is zero, then this <tt>rcu_node</tt>
structure believes that RCU is idle.
-Otherwise, as with the <tt>rcu_state</tt> structure,
-the <tt>->gpnum</tt> field will be one greater than the
-<tt>->complete</tt> fields, with <tt>->gpnum</tt>
-indicating which grace period this <tt>rcu_node</tt> believes
-is still being waited for.
+</p><p>The <tt>>gp_seq</tt> field of each <tt>rcu_node</tt>
+structure is updated at the beginning and the end
+of each grace period.
-</p><p>The <tt>>gpnum</tt> field of each <tt>rcu_node</tt>
-structure is updated at the beginning
-of each grace period, and the <tt>->completed</tt> fields are
-updated at the end of each grace period.
+<p>The <tt>->gp_seq_needed</tt> fields record the
+furthest-in-the-future grace period request seen by the corresponding
+<tt>rcu_node</tt> structure. The request is considered fulfilled when
+the value of the <tt>->gp_seq</tt> field equals or exceeds that of
+the <tt>->gp_seq_needed</tt> field.
+
+<table>
+<tr><th> </th></tr>
+<tr><th align="left">Quick Quiz:</th></tr>
+<tr><td>
+ Suppose that this <tt>rcu_node</tt> structure doesn't see
+ a request for a very long time.
+ Won't wrapping of the <tt>->gp_seq</tt> field cause
+ problems?
+</td></tr>
+<tr><th align="left">Answer:</th></tr>
+<tr><td bgcolor="#ffffff"><font color="ffffff">
+ No, because if the <tt>->gp_seq_needed</tt> field lags behind the
+ <tt>->gp_seq</tt> field, the <tt>->gp_seq_needed</tt> field
+ will be updated at the end of the grace period.
+ Modulo-arithmetic comparisons therefore will always get the
+ correct answer, even with wrapping.
+</font></td></tr>
+<tr><td> </td></tr>
+</table>
<h5>Quiescent-State Tracking</h5>
@@ -626,9 +641,8 @@
</ol>
<p><font color="ffffff">So the locking is absolutely required in
- order to coordinate
- clearing of the bits with the grace-period numbers in
- <tt>->gpnum</tt> and <tt>->completed</tt>.
+ order to coordinate clearing of the bits with updating of the
+ grace-period sequence number in <tt>->gp_seq</tt>.
</font></td></tr>
<tr><td> </td></tr>
</table>
@@ -1038,15 +1052,15 @@
as follows:
<pre>
- 1 unsigned long completed;
- 2 unsigned long gpnum;
+ 1 unsigned long gp_seq;
+ 2 unsigned long gp_seq_needed;
3 bool cpu_no_qs;
4 bool core_needs_qs;
5 bool gpwrap;
6 unsigned long rcu_qs_ctr_snap;
</pre>
-<p>The <tt>completed</tt> and <tt>gpnum</tt>
+<p>The <tt>->gp_seq</tt> and <tt>->gp_seq_needed</tt>
fields are the counterparts of the fields of the same name
in the <tt>rcu_state</tt> and <tt>rcu_node</tt> structures.
They may each lag up to one behind their <tt>rcu_node</tt>
@@ -1054,15 +1068,9 @@
<tt>CONFIG_NO_HZ_FULL</tt> kernels can lag
arbitrarily far behind for CPUs in dyntick-idle mode (but these counters
will catch up upon exit from dyntick-idle mode).
-If a given <tt>rcu_data</tt> structure's <tt>->gpnum</tt> and
-<tt>->complete</tt> fields are equal, then this <tt>rcu_data</tt>
+If the lower two bits of a given <tt>rcu_data</tt> structure's
+<tt>->gp_seq</tt> are zero, then this <tt>rcu_data</tt>
structure believes that RCU is idle.
-Otherwise, as with the <tt>rcu_state</tt> and <tt>rcu_node</tt>
-structure,
-the <tt>->gpnum</tt> field will be one greater than the
-<tt>->complete</tt> fields, with <tt>->gpnum</tt>
-indicating which grace period this <tt>rcu_data</tt> believes
-is still being waited for.
<table>
<tr><th> </th></tr>
@@ -1070,13 +1078,13 @@
<tr><td>
All this replication of the grace period numbers can only cause
massive confusion.
- Why not just keep a global pair of counters and be done with it???
+ Why not just keep a global sequence number and be done with it???
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
- Because if there was only a single global pair of grace-period
+ Because if there was only a single global sequence
numbers, there would need to be a single global lock to allow
- safely accessing and updating them.
+ safely accessing and updating it.
And if we are not going to have a single global lock, we need
to carefully manage the numbers on a per-node basis.
Recall from the answer to a previous Quick Quiz that the consequences
@@ -1091,8 +1099,8 @@
while the <tt>->core_needs_qs</tt> flag indicates that the
RCU core needs a quiescent state from the corresponding CPU.
The <tt>->gpwrap</tt> field indicates that the corresponding
-CPU has remained idle for so long that the <tt>completed</tt>
-and <tt>gpnum</tt> counters are in danger of overflow, which
+CPU has remained idle for so long that the
+<tt>gp_seq</tt> counter is in danger of overflow, which
will cause the CPU to disregard the values of its counters on
its next exit from idle.
Finally, the <tt>rcu_qs_ctr_snap</tt> field is used to detect
@@ -1130,10 +1138,10 @@
whenever it notices that another RCU grace period has completed.
The CPU detects the completion of an RCU grace period by noticing
that the value of its <tt>rcu_data</tt> structure's
-<tt>->completed</tt> field differs from that of its leaf
+<tt>->gp_seq</tt> field differs from that of its leaf
<tt>rcu_node</tt> structure.
Recall that each <tt>rcu_node</tt> structure's
-<tt>->completed</tt> field is updated at the end of each
+<tt>->gp_seq</tt> field is updated at the beginnings and ends of each
grace period.
<p>
diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
index 8651b0b..a346ce0 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
@@ -357,7 +357,7 @@
grace-period initialization.
<p>The first ordering-related grace-period initialization action is to
-increment the <tt>rcu_state</tt> structure's <tt>->gpnum</tt>
+advance the <tt>rcu_state</tt> structure's <tt>->gp_seq</tt>
grace-period-number counter, as shown below:
</p><p><img src="TreeRCU-gp-init-1.svg" alt="TreeRCU-gp-init-1.svg" width="75%">
@@ -388,7 +388,7 @@
<p>The final <tt>rcu_gp_init()</tt> pass through the <tt>rcu_node</tt>
tree traverses breadth-first, setting each <tt>rcu_node</tt> structure's
-<tt>->gpnum</tt> field to the newly incremented value from the
+<tt>->gp_seq</tt> field to the newly advanced value from the
<tt>rcu_state</tt> structure, as shown in the following diagram.
</p><p><img src="TreeRCU-gp-init-3.svg" alt="TreeRCU-gp-init-1.svg" width="75%">
@@ -398,9 +398,9 @@
to notice that a new grace period has started, as described in the next
section.
But because the grace-period kthread started the grace period at the
-root (with the increment of the <tt>rcu_state</tt> structure's
-<tt>->gpnum</tt> field) before setting each leaf <tt>rcu_node</tt>
-structure's <tt>->gpnum</tt> field, each CPU's observation of
+root (with the advancing of the <tt>rcu_state</tt> structure's
+<tt>->gp_seq</tt> field) before setting each leaf <tt>rcu_node</tt>
+structure's <tt>->gp_seq</tt> field, each CPU's observation of
the start of the grace period will happen after the actual start
of the grace period.
@@ -466,7 +466,7 @@
<tr><td>
But a RCU read-side critical section might have started
after the beginning of the grace period
- (the <tt>->gpnum++</tt> from earlier), so why should
+ (the advancing of <tt>->gp_seq</tt> from earlier), so why should
the grace period wait on such a critical section?
</td></tr>
<tr><th align="left">Answer:</th></tr>
@@ -609,10 +609,8 @@
<h4><a name="Grace-Period Cleanup">Grace-Period Cleanup</a></h4>
<p>Grace-period cleanup first scans the <tt>rcu_node</tt> tree
-breadth-first setting all the <tt>->completed</tt> fields equal
-to the number of the newly completed grace period, then it sets
-the <tt>rcu_state</tt> structure's <tt>->completed</tt> field,
-again to the number of the newly completed grace period.
+breadth-first advancing all the <tt>->gp_seq</tt> fields, then it
+advances the <tt>rcu_state</tt> structure's <tt>->gp_seq</tt> field.
The ordering effects are shown below:
</p><p><img src="TreeRCU-gp-cleanup.svg" alt="TreeRCU-gp-cleanup.svg" width="75%">
@@ -634,7 +632,7 @@
CPU has reported its quiescent state, but it may be some
milliseconds before RCU becomes aware of this.
The latest reasonable candidate is once the <tt>rcu_state</tt>
- structure's <tt>->completed</tt> field has been updated,
+ structure's <tt>->gp_seq</tt> field has been updated,
but it is quite possible that some CPUs have already completed
phase two of their updates by that time.
In short, if you are going to work with RCU, you need to
@@ -647,7 +645,7 @@
<h4><a name="Callback Invocation">Callback Invocation</a></h4>
<p>Once a given CPU's leaf <tt>rcu_node</tt> structure's
-<tt>->completed</tt> field has been updated, that CPU can begin
+<tt>->gp_seq</tt> field has been updated, that CPU can begin
invoking its RCU callbacks that were waiting for this grace period
to end.
These callbacks are identified by <tt>rcu_advance_cbs()</tt>,
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svg
index 754f426..bf84fba 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svg
@@ -384,11 +384,11 @@
inkscape:window-height="1144"
id="namedview208"
showgrid="true"
- inkscape:zoom="0.70710678"
- inkscape:cx="617.89017"
- inkscape:cy="542.52419"
- inkscape:window-x="86"
- inkscape:window-y="28"
+ inkscape:zoom="0.78716603"
+ inkscape:cx="513.06403"
+ inkscape:cy="623.1214"
+ inkscape:window-x="102"
+ inkscape:window-y="38"
inkscape:window-maximized="0"
inkscape:current-layer="g3188-3"
fit-margin-top="5"
@@ -417,13 +417,15 @@
id="g3188">
<text
xml:space="preserve"
- x="3199.1516"
+ x="3145.9592"
y="13255.592"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->completed = ->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3143">rcu_seq_end(&rnp->gp_seq)</tspan></text>
<g
id="g3107"
transform="translate(947.90548,11584.029)">
@@ -502,13 +504,15 @@
</g>
<text
xml:space="preserve"
- x="5324.5371"
- y="15414.598"
+ x="5264.4731"
+ y="15428.84"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-753"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ id="text202-36-7"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-5">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<g
style="fill:none;stroke-width:0.025in"
@@ -547,15 +551,6 @@
sodipodi:linespacing="125%"><tspan
style="font-size:159.57754517px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;font-family:Liberation Sans;-inkscape-font-specification:Liberation Sans"
id="tspan3104-6-5-6-0">Leaf</tspan></text>
- <text
- xml:space="preserve"
- x="7479.5796"
- y="17699.943"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-9"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
@@ -566,15 +561,6 @@
style="fill:none;stroke-width:0.025in"
transform="translate(-737.93887,7732.6672)"
id="g3188-3">
- <text
- xml:space="preserve"
- x="3225.7478"
- y="13175.802"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-60"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">rsp->completed =</text>
<g
id="g3107-62"
transform="translate(947.90548,11584.029)">
@@ -607,15 +593,6 @@
sodipodi:linespacing="125%"><tspan
style="font-size:159.57754517px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;font-family:Liberation Sans;-inkscape-font-specification:Liberation Sans"
id="tspan3104-6-5-7">Root</tspan></text>
- <text
- xml:space="preserve"
- x="3225.7478"
- y="13390.038"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-60-3"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"> rnp->completed</text>
<flowRoot
xml:space="preserve"
id="flowRoot3356"
@@ -627,7 +604,18 @@
height="63.63961"
x="332.34018"
y="681.87292" /></flowRegion><flowPara
- id="flowPara3362" /></flowRoot> </g>
+ id="flowPara3362" /></flowRoot> <text
+ xml:space="preserve"
+ x="3156.6121"
+ y="13317.754"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-36-6"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-0">rcu_seq_end(&rsp->gp_seq)</tspan></text>
+ </g>
<g
style="fill:none;stroke-width:0.025in"
transform="translate(-858.40227,7769.0342)"
@@ -859,6 +847,17 @@
id="path3414-8-3-6-6"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ x="7418.769"
+ y="17646.104"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-36-70"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-93">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<g
transform="translate(-1642.5377,-11611.245)"
@@ -887,13 +886,15 @@
</g>
<text
xml:space="preserve"
- x="5327.3057"
+ x="5274.1133"
y="15428.84"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-36"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<g
transform="translate(-151.71746,-11647.612)"
@@ -972,13 +973,15 @@
id="tspan3104-6-5-6-0-92">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7486.4907"
- y="17670.119"
+ x="7408.5918"
+ y="17619.504"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-6"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ id="text202-36-2"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-9">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<g
transform="translate(-6817.1997,-11647.612)"
@@ -1019,13 +1022,15 @@
id="tspan3104-6-5-6-0-1">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7474.1382"
- y="17688.926"
+ x="7416.8003"
+ y="17619.504"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-5"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ id="text202-36-3"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-56">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<path
style="fill:none;stroke:#000000;stroke-width:13.29812908px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow2Lend)"
@@ -1059,15 +1064,6 @@
id="path3414-8-3-6"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
- <text
- xml:space="preserve"
- x="7318.9653"
- y="6031.6353"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-2"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
<g
style="fill:none;stroke-width:0.025in"
id="g4504-3-9"
@@ -1123,4 +1119,15 @@
id="path3134-9-0-3-5"
d="m 6875.6003,15833.906 1595.7755,0"
style="fill:none;stroke:#969696;stroke-width:53.19251633;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-end:url(#Arrow1Send-36)" />
+ <text
+ xml:space="preserve"
+ x="7275.2612"
+ y="5971.8916"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-36-1"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-2">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</svg>
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svg
index 0161262..8c20755 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svg
@@ -272,13 +272,13 @@
inkscape:window-height="1144"
id="namedview208"
showgrid="true"
- inkscape:zoom="0.70710678"
- inkscape:cx="617.89019"
- inkscape:cy="636.57143"
- inkscape:window-x="697"
+ inkscape:zoom="2.6330492"
+ inkscape:cx="524.82797"
+ inkscape:cy="519.31194"
+ inkscape:window-x="79"
inkscape:window-y="28"
inkscape:window-maximized="0"
- inkscape:current-layer="svg2"
+ inkscape:current-layer="g3188"
fit-margin-top="5"
fit-margin-right="5"
fit-margin-left="5"
@@ -305,13 +305,15 @@
id="g3188">
<text
xml:space="preserve"
- x="3305.5364"
+ x="3119.363"
y="13255.592"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">rsp->gpnum++</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3071">rcu_seq_start(rsp->gp_seq)</tspan></text>
<g
id="g3107"
transform="translate(947.90548,11584.029)">
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svg
index de6ecc5..d24d7d5 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svg
@@ -19,7 +19,7 @@
id="svg2"
version="1.1"
inkscape:version="0.48.4 r9939"
- sodipodi:docname="TreeRCU-gp-init-2.svg">
+ sodipodi:docname="TreeRCU-gp-init-3.svg">
<metadata
id="metadata212">
<rdf:RDF>
@@ -257,18 +257,22 @@
inkscape:window-width="1087"
inkscape:window-height="1144"
id="namedview208"
- showgrid="false"
- inkscape:zoom="0.70710678"
+ showgrid="true"
+ inkscape:zoom="0.68224756"
inkscape:cx="617.89019"
inkscape:cy="625.84293"
- inkscape:window-x="697"
+ inkscape:window-x="54"
inkscape:window-y="28"
inkscape:window-maximized="0"
- inkscape:current-layer="svg2"
+ inkscape:current-layer="g3153"
fit-margin-top="5"
fit-margin-right="5"
fit-margin-left="5"
- fit-margin-bottom="5" />
+ fit-margin-bottom="5">
+ <inkscape:grid
+ type="xygrid"
+ id="grid3090" />
+ </sodipodi:namedview>
<path
sodipodi:nodetypes="cccccccccccccccccccccccc"
inkscape:connector-curvature="0"
@@ -281,13 +285,13 @@
id="g3188">
<text
xml:space="preserve"
- x="3305.5364"
+ x="3145.9592"
y="13255.592"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->gpnum = rsp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->gp_seq = rsp->gp_seq</text>
<g
id="g3107"
transform="translate(947.90548,11584.029)">
@@ -366,13 +370,13 @@
</g>
<text
xml:space="preserve"
- x="5392.3345"
- y="15407.104"
+ x="5253.6904"
+ y="15407.032"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-6"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
style="fill:none;stroke-width:0.025in"
@@ -413,13 +417,13 @@
id="tspan3104-6-5-6-0">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7536.4883"
- y="17640.934"
+ x="7415.4365"
+ y="17670.572"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-9"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
transform="translate(-1642.5375,-11610.962)"
@@ -448,13 +452,13 @@
</g>
<text
xml:space="preserve"
- x="5378.4146"
- y="15436.927"
+ x="5258.0688"
+ y="15412.313"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-3"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
transform="translate(-151.71726,-11647.329)"
@@ -533,13 +537,13 @@
id="tspan3104-6-5-6-0-92">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7520.1294"
- y="17673.639"
+ x="7405.2607"
+ y="17670.572"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-35"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
transform="translate(-6817.1998,-11647.329)"
@@ -580,13 +584,13 @@
id="tspan3104-6-5-6-0-1">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7521.4663"
- y="17666.062"
+ x="7413.4688"
+ y="17670.566"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-75"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<path
style="fill:none;stroke:#000000;stroke-width:13.29812908px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow2Lend)"
@@ -622,11 +626,11 @@
sodipodi:nodetypes="cc" />
<text
xml:space="preserve"
- x="7370.856"
- y="5997.5972"
+ x="7271.9297"
+ y="6023.2412"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-62"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</svg>
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
index b13b7b0..acd73c7 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
@@ -1070,13 +1070,13 @@
inkscape:window-height="1144"
id="namedview208"
showgrid="true"
- inkscape:zoom="0.6004608"
- inkscape:cx="826.65969"
- inkscape:cy="483.3047"
- inkscape:window-x="66"
- inkscape:window-y="28"
+ inkscape:zoom="0.81932583"
+ inkscape:cx="840.45848"
+ inkscape:cy="5052.4242"
+ inkscape:window-x="787"
+ inkscape:window-y="24"
inkscape:window-maximized="0"
- inkscape:current-layer="svg2"
+ inkscape:current-layer="g4"
fit-margin-top="5"
fit-margin-right="5"
fit-margin-left="5"
@@ -1543,15 +1543,6 @@
style="fill:none;stroke-width:0.025in"
transform="translate(1749.0282,658.72243)"
id="g3188">
- <text
- xml:space="preserve"
- x="3305.5364"
- y="13255.592"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-5"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">rsp->gpnum++</text>
<g
id="g3107-62"
transform="translate(947.90548,11584.029)">
@@ -1584,6 +1575,17 @@
sodipodi:linespacing="125%"><tspan
style="font-size:159.57754517px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;font-family:Liberation Sans;-inkscape-font-specification:Liberation Sans"
id="tspan3104-6-5-7">Root</tspan></text>
+ <text
+ xml:space="preserve"
+ x="3137.9988"
+ y="13271.316"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-626"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3071">rcu_seq_start(rsp->gp_seq)</tspan></text>
</g>
<rect
ry="0"
@@ -2318,15 +2320,6 @@
style="fill:none;stroke-width:0.025in"
transform="translate(1739.0986,17188.625)"
id="g3188-6">
- <text
- xml:space="preserve"
- x="3305.5364"
- y="13255.592"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-1"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->gpnum = rsp->gpnum</text>
<g
id="g3107-5"
transform="translate(947.90548,11584.029)">
@@ -2359,6 +2352,15 @@
sodipodi:linespacing="125%"><tspan
style="font-size:159.57754517px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;font-family:Liberation Sans;-inkscape-font-specification:Liberation Sans"
id="tspan3104-6-5-1">Root</tspan></text>
+ <text
+ xml:space="preserve"
+ x="3147.9268"
+ y="13240.524"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-1"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
style="fill:none;stroke-width:0.025in"
@@ -2387,13 +2389,13 @@
</g>
<text
xml:space="preserve"
- x="5392.3345"
- y="15407.104"
+ x="5263.1094"
+ y="15411.646"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-6-7"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ id="text202-92"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
style="fill:none;stroke-width:0.025in"
@@ -2434,13 +2436,13 @@
id="tspan3104-6-5-6-0-94">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7536.4883"
- y="17640.934"
+ x="7417.4053"
+ y="17655.502"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-9"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ id="text202-759"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
transform="translate(-2353.8462,17224.992)"
@@ -2469,13 +2471,13 @@
</g>
<text
xml:space="preserve"
- x="5378.4146"
- y="15436.927"
+ x="5246.1548"
+ y="15411.648"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-3"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ id="text202-87"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
transform="translate(-863.02613,17188.625)"
@@ -2554,13 +2556,13 @@
id="tspan3104-6-5-6-0-92-6">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7520.1294"
- y="17673.639"
+ x="7433.8257"
+ y="17682.098"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-35"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ id="text202-2"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<g
transform="translate(-7528.5085,17188.625)"
@@ -2601,13 +2603,13 @@
id="tspan3104-6-5-6-0-1-8">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7521.4663"
- y="17666.062"
+ x="7415.4404"
+ y="17682.098"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-75-1"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
+ id="text202-0"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
<path
style="fill:none;stroke:#000000;stroke-width:13.29812813px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow2Lend)"
@@ -2641,15 +2643,6 @@
id="path3414-8-3-6-4"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
- <text
- xml:space="preserve"
- x="6659.5469"
- y="34833.551"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-62"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gpnum = rsp->gpnum</text>
<path
sodipodi:nodetypes="ccc"
inkscape:connector-curvature="0"
@@ -3844,7 +3837,7 @@
font-weight="bold"
font-size="192"
id="text202-6-6-5"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rdp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rdp->gp_seq</text>
<text
xml:space="preserve"
x="5035.4155"
@@ -4284,15 +4277,6 @@
style="fill:none;stroke-width:0.025in"
transform="translate(1874.038,53203.538)"
id="g3188-7">
- <text
- xml:space="preserve"
- x="3199.1516"
- y="13255.592"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-82"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->completed = ->gpnum</text>
<g
id="g3107-53"
transform="translate(947.90548,11584.029)">
@@ -4325,6 +4309,17 @@
sodipodi:linespacing="125%"><tspan
style="font-size:159.57754517px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;font-family:Liberation Sans;-inkscape-font-specification:Liberation Sans"
id="tspan3104-6-5-19">Root</tspan></text>
+ <text
+ xml:space="preserve"
+ x="3175.896"
+ y="13240.11"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-36-3"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<rect
ry="0"
@@ -4371,13 +4366,15 @@
</g>
<text
xml:space="preserve"
- x="5324.5371"
- y="15414.598"
+ x="5264.4829"
+ y="15411.231"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-753"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ id="text202-36-7"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-5">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<g
style="fill:none;stroke-width:0.025in"
@@ -4412,30 +4409,12 @@
sodipodi:linespacing="125%"><tspan
style="font-size:159.57754517px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;font-family:Liberation Sans;-inkscape-font-specification:Liberation Sans"
id="tspan3104-6-5-6-0-4">Leaf</tspan></text>
- <text
- xml:space="preserve"
- x="10084.225"
- y="70903.312"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-9-0"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
<path
sodipodi:nodetypes="ccc"
inkscape:connector-curvature="0"
id="path3134-9-0-3-9"
d="m 6315.6122,72629.054 -20.9533,8108.684 1648.968,0"
style="fill:none;stroke:#969696;stroke-width:53.19251251;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-end:url(#Arrow1Send)" />
- <text
- xml:space="preserve"
- x="5092.4683"
- y="74111.672"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-60"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rsp->completed =</text>
<g
style="fill:none;stroke-width:0.025in"
id="g3107-62-6"
@@ -4469,15 +4448,6 @@
sodipodi:linespacing="125%"><tspan
style="font-size:159.57754517px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;writing-mode:lr-tb;text-anchor:start;font-family:Liberation Sans;-inkscape-font-specification:Liberation Sans"
id="tspan3104-6-5-7-7">Root</tspan></text>
- <text
- xml:space="preserve"
- x="5092.4683"
- y="74325.906"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-60-3"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"> rnp->completed</text>
<g
style="fill:none;stroke-width:0.025in"
transform="translate(1746.2528,60972.572)"
@@ -4736,13 +4706,15 @@
</g>
<text
xml:space="preserve"
- x="5327.3057"
- y="15428.84"
+ x="5274.1216"
+ y="15411.231"
font-style="normal"
font-weight="bold"
font-size="192"
id="text202-36"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-6">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<g
transform="translate(-728.08545,53203.538)"
@@ -4821,13 +4793,15 @@
id="tspan3104-6-5-6-0-92-5">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7486.4907"
- y="17670.119"
+ x="7435.1987"
+ y="17708.281"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-6-2"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ id="text202-36-9"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-1">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<g
transform="translate(-7393.5687,53203.538)"
@@ -4868,13 +4842,15 @@
id="tspan3104-6-5-6-0-1-5">Leaf</tspan></text>
<text
xml:space="preserve"
- x="7474.1382"
- y="17688.926"
+ x="7416.8125"
+ y="17708.281"
font-style="normal"
font-weight="bold"
font-size="192"
- id="text202-5-1"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
+ id="text202-36-35"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-62">rcu_seq_end(&rnp->gp_seq)</tspan></text>
</g>
<path
style="fill:none;stroke:#000000;stroke-width:13.29812813px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow2Lend)"
@@ -4908,15 +4884,6 @@
id="path3414-8-3-6-67"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
- <text
- xml:space="preserve"
- x="6742.6001"
- y="70882.617"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-2"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->completed = ->gpnum</text>
<g
style="fill:none;stroke-width:0.025in"
id="g4504-3-9-6"
@@ -5131,5 +5098,47 @@
font-size="192"
id="text202-7-9-6-6-7"
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_do_batch()</text>
+ <text
+ xml:space="preserve"
+ x="6698.9019"
+ y="70885.211"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-36-2"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-7">rcu_seq_end(&rnp->gp_seq)</tspan></text>
+ <text
+ xml:space="preserve"
+ x="10023.457"
+ y="70885.234"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-36-0"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-9">rcu_seq_end(&rnp->gp_seq)</tspan></text>
+ <text
+ xml:space="preserve"
+ x="5023.3389"
+ y="74209.773"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-36-36"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"><tspan
+ style="font-size:172.87567139px"
+ id="tspan3166-0">rcu_seq_end(&rsp->gp_seq)</tspan></text>
+ <text
+ xml:space="preserve"
+ x="6562.5884"
+ y="34870.727"
+ font-style="normal"
+ font-weight="bold"
+ font-size="192"
+ id="text202-3"
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">->gp_seq = rsp->gp_seq</text>
</g>
</svg>
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg
index de3992f..149bec2 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg
@@ -300,13 +300,13 @@
inkscape:window-height="1144"
id="namedview208"
showgrid="true"
- inkscape:zoom="0.70710678"
- inkscape:cx="616.47598"
- inkscape:cy="595.41964"
- inkscape:window-x="813"
+ inkscape:zoom="0.96484375"
+ inkscape:cx="507.0191"
+ inkscape:cy="885.62207"
+ inkscape:window-x="47"
inkscape:window-y="28"
inkscape:window-maximized="0"
- inkscape:current-layer="g4405"
+ inkscape:current-layer="g3115"
fit-margin-top="5"
fit-margin-right="5"
fit-margin-left="5"
@@ -710,7 +710,7 @@
font-weight="bold"
font-size="192"
id="text202-6-6"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rdp->gpnum</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rdp->gp_seq</text>
<text
xml:space="preserve"
x="5035.4155"
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 4259f95..f99cf11 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -172,7 +172,7 @@
INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
- (detected by 32, t=2603 jiffies, g=7073, c=7072, q=625)
+ (detected by 32, t=2603 jiffies, g=7075, q=625)
This message indicates that CPU 32 detected that CPUs 2 and 16 were both
causing stalls, and that the stall was affecting RCU-sched. This message
@@ -215,11 +215,10 @@
period.
The "detected by" line indicates which CPU detected the stall (in this
-case, CPU 32), how many jiffies have elapsed since the start of the
-grace period (in this case 2603), the number of the last grace period
-to start and to complete (7073 and 7072, respectively), and an estimate
-of the total number of RCU callbacks queued across all CPUs (625 in
-this case).
+case, CPU 32), how many jiffies have elapsed since the start of the grace
+period (in this case 2603), the grace-period sequence number (7075), and
+an estimate of the total number of RCU callbacks queued across all CPUs
+(625 in this case).
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
for each CPU:
@@ -266,15 +265,16 @@
the stall warning, as was the case in the "All QSes seen" line above,
the following additional line is printed:
- kthread starved for 23807 jiffies! g7073 c7072 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
+ kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
Starving the grace-period kthreads of CPU time can of course result
in RCU CPU stall warnings even when all CPUs and tasks have passed
-through the required quiescent states. The "g" and "c" numbers flag the
-number of the last grace period started and completed, respectively,
-the "f" precedes the ->gp_flags command to the grace-period kthread,
-the "RCU_GP_WAIT_FQS" indicates that the kthread is waiting for a short
-timeout, and the "state" precedes value of the task_struct ->state field.
+through the required quiescent states. The "g" number shows the current
+grace-period sequence number, the "f" precedes the ->gp_flags command
+to the grace-period kthread, the "RCU_GP_WAIT_FQS" indicates that the
+kthread is waiting for a short timeout, the "state" precedes value of the
+task_struct ->state field, and the "cpu" indicates that the grace-period
+kthread last ran on CPU 5.
Multiple Warnings From One Stall
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 65eb856..c2a7fac 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -588,6 +588,7 @@
void synchronize_rcu(void)
{
write_lock(&rcu_gp_mutex);
+ smp_mb__after_spinlock();
write_unlock(&rcu_gp_mutex);
}
@@ -609,12 +610,15 @@
The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
and release a global reader-writer lock. The synchronize_rcu()
-primitive write-acquires this same lock, then immediately releases
-it. This means that once synchronize_rcu() exits, all RCU read-side
-critical sections that were in progress before synchronize_rcu() was
-called are guaranteed to have completed -- there is no way that
-synchronize_rcu() would have been able to write-acquire the lock
-otherwise.
+primitive write-acquires this same lock, then releases it. This means
+that once synchronize_rcu() exits, all RCU read-side critical sections
+that were in progress before synchronize_rcu() was called are guaranteed
+to have completed -- there is no way that synchronize_rcu() would have
+been able to write-acquire the lock otherwise. The smp_mb__after_spinlock()
+promotes synchronize_rcu() to a full memory barrier in compliance with
+the "Memory-Barrier Guarantees" listed in:
+
+ Documentation/RCU/Design/Requirements/Requirements.html.
It is possible to nest rcu_read_lock(), since reader-writer locks may
be recursively acquired. Note also that rcu_read_lock() is immune
@@ -816,11 +820,13 @@
list_next_rcu
list_for_each_entry_rcu
list_for_each_entry_continue_rcu
+ list_for_each_entry_from_rcu
hlist_first_rcu
hlist_next_rcu
hlist_pprev_rcu
hlist_for_each_entry_rcu
hlist_for_each_entry_rcu_bh
+ hlist_for_each_entry_from_rcu
hlist_for_each_entry_continue_rcu
hlist_for_each_entry_continue_rcu_bh
hlist_nulls_first_rcu
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 533ff5c..5cde1ff 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2835,8 +2835,6 @@
nosync [HW,M68K] Disables sync negotiation for all devices.
- notsc [BUGS=X86-32] Disable Time Stamp Counter
-
nowatchdog [KNL] Disable both lockup detectors, i.e.
soft-lockup and NMI watchdog (hard-lockup).
@@ -3632,8 +3630,8 @@
Set time (s) after boot for CPU-hotplug testing.
rcutorture.onoff_interval= [KNL]
- Set time (s) between CPU-hotplug operations, or
- zero to disable CPU-hotplug testing.
+ Set time (jiffies) between CPU-hotplug operations,
+ or zero to disable CPU-hotplug testing.
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks
diff --git a/Documentation/core-api/atomic_ops.rst b/Documentation/core-api/atomic_ops.rst
index 2e7165f..7245834 100644
--- a/Documentation/core-api/atomic_ops.rst
+++ b/Documentation/core-api/atomic_ops.rst
@@ -29,7 +29,7 @@
local_t.
The first operations to implement for atomic_t's are the initializers and
-plain reads. ::
+plain writes. ::
#define ATOMIC_INIT(i) { (i) }
#define atomic_set(v, i) ((v)->counter = (i))
diff --git a/Documentation/devicetree/bindings/interrupt-controller/ingenic,intc.txt b/Documentation/devicetree/bindings/interrupt-controller/ingenic,intc.txt
index 5f89fb6..f97fd8a 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/ingenic,intc.txt
+++ b/Documentation/devicetree/bindings/interrupt-controller/ingenic,intc.txt
@@ -4,6 +4,7 @@
- compatible : should be "ingenic,<socname>-intc". Valid strings are:
ingenic,jz4740-intc
+ ingenic,jz4725b-intc
ingenic,jz4770-intc
ingenic,jz4775-intc
ingenic,jz4780-intc
diff --git a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt
index 20f121d..697ca2f2 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt
+++ b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt
@@ -7,6 +7,7 @@
- "renesas,irqc-r8a73a4" (R-Mobile APE6)
- "renesas,irqc-r8a7743" (RZ/G1M)
- "renesas,irqc-r8a7745" (RZ/G1E)
+ - "renesas,irqc-r8a77470" (RZ/G1C)
- "renesas,irqc-r8a7790" (R-Car H2)
- "renesas,irqc-r8a7791" (R-Car M2-W)
- "renesas,irqc-r8a7792" (R-Car V2H)
@@ -16,6 +17,7 @@
- "renesas,intc-ex-r8a7796" (R-Car M3-W)
- "renesas,intc-ex-r8a77965" (R-Car M3-N)
- "renesas,intc-ex-r8a77970" (R-Car V3M)
+ - "renesas,intc-ex-r8a77980" (R-Car V3H)
- "renesas,intc-ex-r8a77995" (R-Car D3)
- #interrupt-cells: has to be <2>: an interrupt index and flags, as defined in
interrupts.txt in this directory
diff --git a/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt b/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
index b1fe7e9..18d4d01 100644
--- a/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
+++ b/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
@@ -1,19 +1,25 @@
-Mediatek MT6577, MT6572 and MT6589 Timers
----------------------------------------
+Mediatek Timers
+---------------
+
+Mediatek SoCs have two different timers on different platforms,
+- GPT (General Purpose Timer)
+- SYST (System Timer)
+
+The proper timer will be selected automatically by driver.
Required properties:
- compatible should contain:
- * "mediatek,mt2701-timer" for MT2701 compatible timers
- * "mediatek,mt6580-timer" for MT6580 compatible timers
- * "mediatek,mt6589-timer" for MT6589 compatible timers
- * "mediatek,mt7623-timer" for MT7623 compatible timers
- * "mediatek,mt8127-timer" for MT8127 compatible timers
- * "mediatek,mt8135-timer" for MT8135 compatible timers
- * "mediatek,mt8173-timer" for MT8173 compatible timers
- * "mediatek,mt6577-timer" for MT6577 and all above compatible timers
-- reg: Should contain location and length for timers register.
-- clocks: Clocks driving the timer hardware. This list should include two
- clocks. The order is system clock and as second clock the RTC clock.
+ * "mediatek,mt2701-timer" for MT2701 compatible timers (GPT)
+ * "mediatek,mt6580-timer" for MT6580 compatible timers (GPT)
+ * "mediatek,mt6589-timer" for MT6589 compatible timers (GPT)
+ * "mediatek,mt7623-timer" for MT7623 compatible timers (GPT)
+ * "mediatek,mt8127-timer" for MT8127 compatible timers (GPT)
+ * "mediatek,mt8135-timer" for MT8135 compatible timers (GPT)
+ * "mediatek,mt8173-timer" for MT8173 compatible timers (GPT)
+ * "mediatek,mt6577-timer" for MT6577 and all above compatible timers (GPT)
+ * "mediatek,mt6765-timer" for MT6765 compatible timers (SYST)
+- reg: Should contain location and length for timer register.
+- clocks: Should contain system clock.
Examples:
@@ -21,5 +27,5 @@
compatible = "mediatek,mt6577-timer";
reg = <0x10008000 0x80>;
interrupts = <GIC_SPI 113 IRQ_TYPE_LEVEL_LOW>;
- clocks = <&system_clk>, <&rtc_clk>;
+ clocks = <&system_clk>;
};
diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
index dbdf629..c7858dd 100644
--- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
+++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
@@ -5,10 +5,10 @@
#
# Architecture requirements
#
-# * arm64
+# * arm/arm64
#
-# Rely on eret context synchronization when returning from IPI handler, and
-# when returning to user-space.
+# Rely on implicit context synchronization as a result of exception return
+# when returning from IPI handler, and when returning to user-space.
#
# * x86
#
@@ -31,7 +31,7 @@
-----------------------
| alpha: | TODO |
| arc: | TODO |
- | arm: | TODO |
+ | arm: | ok |
| arm64: | ok |
| c6x: | TODO |
| h8300: | TODO |
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index cb3b0de..10f4499 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -80,6 +80,26 @@
"post_handler," if any, that is associated with the kprobe.
Execution then continues with the instruction following the probepoint.
+Changing Execution Path
+-----------------------
+
+Since kprobes can probe into a running kernel code, it can change the
+register set, including instruction pointer. This operation requires
+maximum care, such as keeping the stack frame, recovering the execution
+path etc. Since it operates on a running kernel and needs deep knowledge
+of computer architecture and concurrent computing, you can easily shoot
+your foot.
+
+If you change the instruction pointer (and set up other related
+registers) in pre_handler, you must return !0 so that kprobes stops
+single stepping and just returns to the given address.
+This also means post_handler should not be called anymore.
+
+Note that this operation may be harder on some architectures which use
+TOC (Table of Contents) for function call, since you have to setup a new
+TOC for your function in your module, and recover the old one after
+returning from it.
+
Return Probes
-------------
@@ -262,7 +282,7 @@
tweak the kernel's execution path, you need to suppress optimization,
using one of the following techniques:
-- Specify an empty function for the kprobe's post_handler or break_handler.
+- Specify an empty function for the kprobe's post_handler.
or
@@ -474,7 +494,7 @@
the bad probe, are safely unregistered before the register_*probes
function returns.
-- kps/rps/jps: an array of pointers to ``*probe`` data structures
+- kps/rps: an array of pointers to ``*probe`` data structures
- num: the number of the array entries.
.. note::
@@ -566,12 +586,11 @@
Kprobes does not use mutexes or allocate memory except during
registration and unregistration.
-Probe handlers are run with preemption disabled. Depending on the
-architecture and optimization state, handlers may also run with
-interrupts disabled (e.g., kretprobe handlers and optimized kprobe
-handlers run without interrupt disabled on x86/x86-64). In any case,
-your handler should not yield the CPU (e.g., by attempting to acquire
-a semaphore).
+Probe handlers are run with preemption disabled or interrupt disabled,
+which depends on the architecture and optimization state. (e.g.,
+kretprobe handlers and optimized kprobe handlers run without interrupt
+disabled on x86/x86-64). In any case, your handler should not yield
+the CPU (e.g., by attempting to acquire a semaphore, or waiting I/O).
Since a return probe is implemented by replacing the return
address with the trampoline's address, stack backtraces and calls
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index a02d6bb..0d8d7ef 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2179,32 +2179,41 @@
event_indicated = 1;
wake_up_process(event_daemon);
-A write memory barrier is implied by wake_up() and co. if and only if they
-wake something up. The barrier occurs before the task state is cleared, and so
-sits between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+A general memory barrier is executed by wake_up() if it wakes something up.
+If it doesn't wake anything up then a memory barrier may or may not be
+executed; you must not rely on it. The barrier occurs before the task state
+is accessed, in particular, it sits between the STORE to indicate the event
+and the STORE to set TASK_RUNNING:
- CPU 1 CPU 2
+ CPU 1 (Sleeper) CPU 2 (Waker)
=============================== ===============================
set_current_state(); STORE event_indicated
smp_store_mb(); wake_up();
- STORE current->state <write barrier>
- <general barrier> STORE current->state
- LOAD event_indicated
+ STORE current->state ...
+ <general barrier> <general barrier>
+ LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
+ STORE task->state
-To repeat, this write memory barrier is present if and only if something
-is actually awakened. To see this, consider the following sequence of
-events, where X and Y are both initially zero:
+where "task" is the thread being woken up and it equals CPU 1's "current".
+
+To repeat, a general memory barrier is guaranteed to be executed by wake_up()
+if something is actually awakened, but otherwise there is no such guarantee.
+To see this, consider the following sequence of events, where X and Y are both
+initially zero:
CPU 1 CPU 2
=============================== ===============================
- X = 1; STORE event_indicated
+ X = 1; Y = 1;
smp_mb(); wake_up();
- Y = 1; wait_event(wq, Y == 1);
- wake_up(); load from Y sees 1, no memory barrier
- load from X might see 0
+ LOAD Y LOAD X
-In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
-to see 1.
+If a wakeup does occur, one (at least) of the two loads must see 1. If, on
+the other hand, a wakeup does not occur, both loads might see 0.
+
+wake_up_process() always executes a general memory barrier. The barrier again
+occurs before the task state is accessed. In particular, if the wake_up() in
+the previous snippet were replaced by a call to wake_up_process() then one of
+the two loads would be guaranteed to see 1.
The available waker functions include:
@@ -2224,6 +2233,8 @@
wake_up_poll();
wake_up_process();
+In terms of memory ordering, these functions all provide the same guarantees of
+a wake_up() (or stronger).
[!] Note that the memory barriers implied by the sleeper and the waker do _not_
order multiple stores before the wake-up with respect to loads of those stored
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index 921739d..7f01fb1 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -1891,22 +1891,22 @@
/* 소유권을 수정 */
desc->status = DEVICE_OWN;
- /* MMIO 를 통해 디바이스에 공지를 하기 전에 메모리를 동기화 */
- wmb();
-
/* 업데이트된 디스크립터의 디바이스에 공지 */
writel(DESC_NOTIFY, doorbell);
}
dma_rmb() 는 디스크립터로부터 데이터를 읽어오기 전에 디바이스가 소유권을
- 내놓았음을 보장하게 하고, dma_wmb() 는 디바이스가 자신이 소유권을 다시
- 가졌음을 보기 전에 디스크립터에 데이터가 쓰였음을 보장합니다. wmb() 는
- 캐시 일관성이 없는 (cache incoherent) MMIO 영역에 쓰기를 시도하기 전에
- 캐시 일관성이 있는 메모리 (cache coherent memory) 쓰기가 완료되었음을
- 보장해주기 위해 필요합니다.
+ 내려놓았을 것을 보장하고, dma_wmb() 는 디바이스가 자신이 소유권을 다시
+ 가졌음을 보기 전에 디스크립터에 데이터가 쓰였을 것을 보장합니다. 참고로,
+ writel() 을 사용하면 캐시 일관성이 있는 메모리 (cache coherent memory)
+ 쓰기가 MMIO 영역에의 쓰기 전에 완료되었을 것을 보장하므로 writel() 앞에
+ wmb() 를 실행할 필요가 없음을 알아두시기 바랍니다. writel() 보다 비용이
+ 저렴한 writel_relaxed() 는 이런 보장을 제공하지 않으므로 여기선 사용되지
+ 않아야 합니다.
- consistent memory 에 대한 자세한 내용을 위해선 Documentation/DMA-API.txt
- 문서를 참고하세요.
+ writel_relaxed() 와 같은 완화된 I/O 접근자들에 대한 자세한 내용을 위해서는
+ "커널 I/O 배리어의 효과" 섹션을, consistent memory 에 대한 자세한 내용을
+ 위해선 Documentation/DMA-API.txt 문서를 참고하세요.
MMIO 쓰기 배리어
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index d10944e..cb8db4f 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4391,6 +4391,22 @@
Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
+7.14 KVM_CAP_S390_HPAGE_1M
+
+Architectures: s390
+Parameters: none
+Returns: 0 on success, -EINVAL if hpage module parameter was not set
+ or cmma is enabled
+
+With this capability the KVM support for memory backing with 1m pages
+through hugetlbfs can be enabled for a VM. After the capability is
+enabled, cmma can't be enabled anymore and pfmfi and the storage key
+interpretation are disabled. If cmma has already been enabled or the
+hpage module parameter is not set to 1, -EINVAL is returned.
+
+While it is generally possible to create a huge page backed VM without
+this capability, the VM will not be able to run.
+
8. Other capabilities.
----------------------
diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index a16aa21..f662d3c 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -29,7 +29,11 @@
L2 and L3 CDP are controlled seperately.
RDT features are orthogonal. A particular system may support only
-monitoring, only control, or both monitoring and control.
+monitoring, only control, or both monitoring and control. Cache
+pseudo-locking is a unique way of using cache control to "pin" or
+"lock" data in the cache. Details can be found in
+"Cache Pseudo-Locking".
+
The mount succeeds if either of allocation or monitoring is present, but
only those files and directories supported by the system will be created.
@@ -65,6 +69,29 @@
some platforms support devices that have their
own settings for cache use which can over-ride
these bits.
+"bit_usage": Annotated capacity bitmasks showing how all
+ instances of the resource are used. The legend is:
+ "0" - Corresponding region is unused. When the system's
+ resources have been allocated and a "0" is found
+ in "bit_usage" it is a sign that resources are
+ wasted.
+ "H" - Corresponding region is used by hardware only
+ but available for software use. If a resource
+ has bits set in "shareable_bits" but not all
+ of these bits appear in the resource groups'
+ schematas then the bits appearing in
+ "shareable_bits" but no resource group will
+ be marked as "H".
+ "X" - Corresponding region is available for sharing and
+ used by hardware and software. These are the
+ bits that appear in "shareable_bits" as
+ well as a resource group's allocation.
+ "S" - Corresponding region is used by software
+ and available for sharing.
+ "E" - Corresponding region is used exclusively by
+ one resource group. No sharing allowed.
+ "P" - Corresponding region is pseudo-locked. No
+ sharing allowed.
Memory bandwitdh(MB) subdirectory contains the following files
with respect to allocation:
@@ -151,6 +178,9 @@
CPUs to/from this group. As with the tasks file a hierarchy is
maintained where MON groups may only include CPUs owned by the
parent CTRL_MON group.
+ When the resouce group is in pseudo-locked mode this file will
+ only be readable, reflecting the CPUs associated with the
+ pseudo-locked region.
"cpus_list":
@@ -163,6 +193,21 @@
A list of all the resources available to this group.
Each resource has its own line and format - see below for details.
+"size":
+ Mirrors the display of the "schemata" file to display the size in
+ bytes of each allocation instead of the bits representing the
+ allocation.
+
+"mode":
+ The "mode" of the resource group dictates the sharing of its
+ allocations. A "shareable" resource group allows sharing of its
+ allocations while an "exclusive" resource group does not. A
+ cache pseudo-locked region is created by first writing
+ "pseudo-locksetup" to the "mode" file before writing the cache
+ pseudo-locked region's schemata to the resource group's "schemata"
+ file. On successful pseudo-locked region creation the mode will
+ automatically change to "pseudo-locked".
+
When monitoring is enabled all MON groups will also contain:
"mon_data":
@@ -379,6 +424,170 @@
L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
+Cache Pseudo-Locking
+--------------------
+CAT enables a user to specify the amount of cache space that an
+application can fill. Cache pseudo-locking builds on the fact that a
+CPU can still read and write data pre-allocated outside its current
+allocated area on a cache hit. With cache pseudo-locking, data can be
+preloaded into a reserved portion of cache that no application can
+fill, and from that point on will only serve cache hits. The cache
+pseudo-locked memory is made accessible to user space where an
+application can map it into its virtual address space and thus have
+a region of memory with reduced average read latency.
+
+The creation of a cache pseudo-locked region is triggered by a request
+from the user to do so that is accompanied by a schemata of the region
+to be pseudo-locked. The cache pseudo-locked region is created as follows:
+- Create a CAT allocation CLOSNEW with a CBM matching the schemata
+ from the user of the cache region that will contain the pseudo-locked
+ memory. This region must not overlap with any current CAT allocation/CLOS
+ on the system and no future overlap with this cache region is allowed
+ while the pseudo-locked region exists.
+- Create a contiguous region of memory of the same size as the cache
+ region.
+- Flush the cache, disable hardware prefetchers, disable preemption.
+- Make CLOSNEW the active CLOS and touch the allocated memory to load
+ it into the cache.
+- Set the previous CLOS as active.
+- At this point the closid CLOSNEW can be released - the cache
+ pseudo-locked region is protected as long as its CBM does not appear in
+ any CAT allocation. Even though the cache pseudo-locked region will from
+ this point on not appear in any CBM of any CLOS an application running with
+ any CLOS will be able to access the memory in the pseudo-locked region since
+ the region continues to serve cache hits.
+- The contiguous region of memory loaded into the cache is exposed to
+ user-space as a character device.
+
+Cache pseudo-locking increases the probability that data will remain
+in the cache via carefully configuring the CAT feature and controlling
+application behavior. There is no guarantee that data is placed in
+cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
+“locked” data from cache. Power management C-states may shrink or
+power off cache. Deeper C-states will automatically be restricted on
+pseudo-locked region creation.
+
+It is required that an application using a pseudo-locked region runs
+with affinity to the cores (or a subset of the cores) associated
+with the cache on which the pseudo-locked region resides. A sanity check
+within the code will not allow an application to map pseudo-locked memory
+unless it runs with affinity to cores associated with the cache on which the
+pseudo-locked region resides. The sanity check is only done during the
+initial mmap() handling, there is no enforcement afterwards and the
+application self needs to ensure it remains affine to the correct cores.
+
+Pseudo-locking is accomplished in two stages:
+1) During the first stage the system administrator allocates a portion
+ of cache that should be dedicated to pseudo-locking. At this time an
+ equivalent portion of memory is allocated, loaded into allocated
+ cache portion, and exposed as a character device.
+2) During the second stage a user-space application maps (mmap()) the
+ pseudo-locked memory into its address space.
+
+Cache Pseudo-Locking Interface
+------------------------------
+A pseudo-locked region is created using the resctrl interface as follows:
+
+1) Create a new resource group by creating a new directory in /sys/fs/resctrl.
+2) Change the new resource group's mode to "pseudo-locksetup" by writing
+ "pseudo-locksetup" to the "mode" file.
+3) Write the schemata of the pseudo-locked region to the "schemata" file. All
+ bits within the schemata should be "unused" according to the "bit_usage"
+ file.
+
+On successful pseudo-locked region creation the "mode" file will contain
+"pseudo-locked" and a new character device with the same name as the resource
+group will exist in /dev/pseudo_lock. This character device can be mmap()'ed
+by user space in order to obtain access to the pseudo-locked memory region.
+
+An example of cache pseudo-locked region creation and usage can be found below.
+
+Cache Pseudo-Locking Debugging Interface
+---------------------------------------
+The pseudo-locking debugging interface is enabled by default (if
+CONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl.
+
+There is no explicit way for the kernel to test if a provided memory
+location is present in the cache. The pseudo-locking debugging interface uses
+the tracing infrastructure to provide two ways to measure cache residency of
+the pseudo-locked region:
+1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
+ from these measurements are best visualized using a hist trigger (see
+ example below). In this test the pseudo-locked region is traversed at
+ a stride of 32 bytes while hardware prefetchers and preemption
+ are disabled. This also provides a substitute visualization of cache
+ hits and misses.
+2) Cache hit and miss measurements using model specific precision counters if
+ available. Depending on the levels of cache on the system the pseudo_lock_l2
+ and pseudo_lock_l3 tracepoints are available.
+ WARNING: triggering this measurement uses from two (for just L2
+ measurements) to four (for L2 and L3 measurements) precision counters on
+ the system, if any other measurements are in progress the counters and
+ their corresponding event registers will be clobbered.
+
+When a pseudo-locked region is created a new debugfs directory is created for
+it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
+write-only file, pseudo_lock_measure, is present in this directory. The
+measurement on the pseudo-locked region depends on the number, 1 or 2,
+written to this debugfs file. Since the measurements are recorded with the
+tracing infrastructure the relevant tracepoints need to be enabled before the
+measurement is triggered.
+
+Example of latency debugging interface:
+In this example a pseudo-locked region named "newlock" was created. Here is
+how we can measure the latency in cycles of reading from this region and
+visualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS
+is set:
+# :> /sys/kernel/debug/tracing/trace
+# echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
+# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
+# echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
+# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
+# cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
+
+# event histogram
+#
+# trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
+#
+
+{ latency: 456 } hitcount: 1
+{ latency: 50 } hitcount: 83
+{ latency: 36 } hitcount: 96
+{ latency: 44 } hitcount: 174
+{ latency: 48 } hitcount: 195
+{ latency: 46 } hitcount: 262
+{ latency: 42 } hitcount: 693
+{ latency: 40 } hitcount: 3204
+{ latency: 38 } hitcount: 3484
+
+Totals:
+ Hits: 8192
+ Entries: 9
+ Dropped: 0
+
+Example of cache hits/misses debugging:
+In this example a pseudo-locked region named "newlock" was created on the L2
+cache of a platform. Here is how we can obtain details of the cache hits
+and misses using the platform's precision counters.
+
+# :> /sys/kernel/debug/tracing/trace
+# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
+# echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
+# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
+# cat /sys/kernel/debug/tracing/trace
+
+# tracer: nop
+#
+# _-----=> irqs-off
+# / _----=> need-resched
+# | / _---=> hardirq/softirq
+# || / _--=> preempt-depth
+# ||| / delay
+# TASK-PID CPU# |||| TIMESTAMP FUNCTION
+# | | | |||| | |
+ pseudo_lock_mea-1672 [002] .... 3132.860500: pseudo_lock_l2: hits=4097 miss=0
+
+
Examples for RDT allocation usage:
Example 1
@@ -502,7 +711,172 @@
# echo F0 > p0/cpus
-4) Locking between applications
+Example 4
+---------
+
+The resource groups in previous examples were all in the default "shareable"
+mode allowing sharing of their cache allocations. If one resource group
+configures a cache allocation then nothing prevents another resource group
+to overlap with that allocation.
+
+In this example a new exclusive resource group will be created on a L2 CAT
+system with two L2 cache instances that can be configured with an 8-bit
+capacity bitmask. The new exclusive resource group will be configured to use
+25% of each cache instance.
+
+# mount -t resctrl resctrl /sys/fs/resctrl/
+# cd /sys/fs/resctrl
+
+First, we observe that the default group is configured to allocate to all L2
+cache:
+
+# cat schemata
+L2:0=ff;1=ff
+
+We could attempt to create the new resource group at this point, but it will
+fail because of the overlap with the schemata of the default group:
+# mkdir p0
+# echo 'L2:0=0x3;1=0x3' > p0/schemata
+# cat p0/mode
+shareable
+# echo exclusive > p0/mode
+-sh: echo: write error: Invalid argument
+# cat info/last_cmd_status
+schemata overlaps
+
+To ensure that there is no overlap with another resource group the default
+resource group's schemata has to change, making it possible for the new
+resource group to become exclusive.
+# echo 'L2:0=0xfc;1=0xfc' > schemata
+# echo exclusive > p0/mode
+# grep . p0/*
+p0/cpus:0
+p0/mode:exclusive
+p0/schemata:L2:0=03;1=03
+p0/size:L2:0=262144;1=262144
+
+A new resource group will on creation not overlap with an exclusive resource
+group:
+# mkdir p1
+# grep . p1/*
+p1/cpus:0
+p1/mode:shareable
+p1/schemata:L2:0=fc;1=fc
+p1/size:L2:0=786432;1=786432
+
+The bit_usage will reflect how the cache is used:
+# cat info/L2/bit_usage
+0=SSSSSSEE;1=SSSSSSEE
+
+A resource group cannot be forced to overlap with an exclusive resource group:
+# echo 'L2:0=0x1;1=0x1' > p1/schemata
+-sh: echo: write error: Invalid argument
+# cat info/last_cmd_status
+overlaps with exclusive group
+
+Example of Cache Pseudo-Locking
+-------------------------------
+Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
+region is exposed at /dev/pseudo_lock/newlock that can be provided to
+application for argument to mmap().
+
+# mount -t resctrl resctrl /sys/fs/resctrl/
+# cd /sys/fs/resctrl
+
+Ensure that there are bits available that can be pseudo-locked, since only
+unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
+removed from the default resource group's schemata:
+# cat info/L2/bit_usage
+0=SSSSSSSS;1=SSSSSSSS
+# echo 'L2:1=0xfc' > schemata
+# cat info/L2/bit_usage
+0=SSSSSSSS;1=SSSSSS00
+
+Create a new resource group that will be associated with the pseudo-locked
+region, indicate that it will be used for a pseudo-locked region, and
+configure the requested pseudo-locked region capacity bitmask:
+
+# mkdir newlock
+# echo pseudo-locksetup > newlock/mode
+# echo 'L2:1=0x3' > newlock/schemata
+
+On success the resource group's mode will change to pseudo-locked, the
+bit_usage will reflect the pseudo-locked region, and the character device
+exposing the pseudo-locked region will exist:
+
+# cat newlock/mode
+pseudo-locked
+# cat info/L2/bit_usage
+0=SSSSSSSS;1=SSSSSSPP
+# ls -l /dev/pseudo_lock/newlock
+crw------- 1 root root 243, 0 Apr 3 05:01 /dev/pseudo_lock/newlock
+
+/*
+ * Example code to access one page of pseudo-locked cache region
+ * from user space.
+ */
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+/*
+ * It is required that the application runs with affinity to only
+ * cores associated with the pseudo-locked region. Here the cpu
+ * is hardcoded for convenience of example.
+ */
+static int cpuid = 2;
+
+int main(int argc, char *argv[])
+{
+ cpu_set_t cpuset;
+ long page_size;
+ void *mapping;
+ int dev_fd;
+ int ret;
+
+ page_size = sysconf(_SC_PAGESIZE);
+
+ CPU_ZERO(&cpuset);
+ CPU_SET(cpuid, &cpuset);
+ ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
+ if (ret < 0) {
+ perror("sched_setaffinity");
+ exit(EXIT_FAILURE);
+ }
+
+ dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
+ if (dev_fd < 0) {
+ perror("open");
+ exit(EXIT_FAILURE);
+ }
+
+ mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+ dev_fd, 0);
+ if (mapping == MAP_FAILED) {
+ perror("mmap");
+ close(dev_fd);
+ exit(EXIT_FAILURE);
+ }
+
+ /* Application interacts with pseudo-locked memory @mapping */
+
+ ret = munmap(mapping, page_size);
+ if (ret < 0) {
+ perror("munmap");
+ close(dev_fd);
+ exit(EXIT_FAILURE);
+ }
+
+ close(dev_fd);
+ exit(EXIT_SUCCESS);
+}
+
+Locking between applications
+----------------------------
Certain operations on the resctrl filesystem, composed of read/writes
to/from multiple files, must be atomic.
@@ -510,7 +884,7 @@
As an example, the allocation of an exclusive reservation of L3 cache
involves:
- 1. Read the cbmmasks from each directory
+ 1. Read the cbmmasks from each directory or the per-resource "bit_usage"
2. Find a contiguous set of bits in the global CBM bitmask that is clear
in any of the directory cbmmasks
3. Create a new directory
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 8d109ef..ad6d2a8 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -92,9 +92,7 @@
Timing
notsc
- Don't use the CPU time stamp counter to read the wall time.
- This can be used to work around timing problems on multiprocessor systems
- with not properly synchronized CPUs.
+ Deprecated, use tsc=unstable instead.
nohpet
Don't use the HPET timer.
@@ -156,6 +154,10 @@
If given as an integer, fills all system RAM with N fake nodes
interleaved over physical nodes.
+ numa=fake=<N>U
+ If given as an integer followed by 'U', it will divide each
+ physical node into N emulated nodes.
+
ACPI
acpi=off Don't enable ACPI
diff --git a/MAINTAINERS b/MAINTAINERS
index 544cac8..0a23427 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8317,10 +8317,16 @@
M: Luc Maranget <luc.maranget@inria.fr>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
R: Akira Yokosawa <akiyks@gmail.com>
+R: Daniel Lustig <dlustig@nvidia.com>
L: linux-kernel@vger.kernel.org
+L: linux-arch@vger.kernel.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
F: tools/memory-model/
+F: Documentation/atomic_bitops.txt
+F: Documentation/atomic_t.txt
+F: Documentation/core-api/atomic_ops.rst
+F: Documentation/core-api/refcount-vs-atomic.rst
F: Documentation/memory-barriers.txt
LINUX SECURITY MODULE (LSM) FRAMEWORK
@@ -12039,9 +12045,9 @@
F: Documentation/RCU/
X: Documentation/RCU/torture.txt
F: include/linux/rcu*
-X: include/linux/srcu.h
+X: include/linux/srcu*.h
F: kernel/rcu/
-X: kernel/torture.c
+X: kernel/rcu/srcu*.c
REAL TIME CLOCK (RTC) SUBSYSTEM
M: Alessandro Zummo <a.zummo@towertech.it>
@@ -12402,7 +12408,6 @@
S390 VFIO-CCW DRIVER
M: Cornelia Huck <cohuck@redhat.com>
-M: Dong Jia Shi <bjsdjshi@linux.ibm.com>
M: Halil Pasic <pasic@linux.ibm.com>
L: linux-s390@vger.kernel.org
L: kvm@vger.kernel.org
@@ -13078,8 +13083,8 @@
W: http://www.rdrop.com/users/paulmck/RCU/
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
-F: include/linux/srcu.h
-F: kernel/rcu/srcu.c
+F: include/linux/srcu*.h
+F: kernel/rcu/srcu*.c
SERIAL LOW-POWER INTER-CHIP MEDIA BUS (SLIMbus)
M: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
@@ -14438,6 +14443,7 @@
F: Documentation/RCU/torture.txt
F: kernel/torture.c
F: kernel/rcu/rcutorture.c
+F: kernel/rcu/rcuperf.c
F: kernel/locking/locktorture.c
TOSHIBA ACPI EXTRAS DRIVER
diff --git a/arch/alpha/include/asm/atomic.h b/arch/alpha/include/asm/atomic.h
index 767bfdd..150a1c5 100644
--- a/arch/alpha/include/asm/atomic.h
+++ b/arch/alpha/include/asm/atomic.h
@@ -18,11 +18,11 @@
* To ensure dependency ordering is preserved for the _relaxed and
* _release atomics, an smp_read_barrier_depends() is unconditionally
* inserted into the _relaxed variants, which are used to build the
- * barriered versions. To avoid redundant back-to-back fences, we can
- * define the _acquire and _fence versions explicitly.
+ * barriered versions. Avoid redundant back-to-back fences in the
+ * _acquire and _fence versions.
*/
-#define __atomic_op_acquire(op, args...) op##_relaxed(args)
-#define __atomic_op_fence __atomic_op_release
+#define __atomic_acquire_fence()
+#define __atomic_post_full_fence()
#define ATOMIC_INIT(i) { (i) }
#define ATOMIC64_INIT(i) { (i) }
@@ -206,7 +206,7 @@
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
/**
- * __atomic_add_unless - add unless the number is a given value
+ * atomic_fetch_add_unless - add unless the number is a given value
* @v: pointer of type atomic_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
@@ -214,7 +214,7 @@
* Atomically adds @a to @v, so long as it was not @u.
* Returns the old value of @v.
*/
-static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
+static __inline__ int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int c, new, old;
smp_mb();
@@ -235,38 +235,39 @@
smp_mb();
return old;
}
-
+#define atomic_fetch_add_unless atomic_fetch_add_unless
/**
- * atomic64_add_unless - add unless the number is a given value
+ * atomic64_fetch_add_unless - add unless the number is a given value
* @v: pointer of type atomic64_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
*
* Atomically adds @a to @v, so long as it was not @u.
- * Returns true iff @v was not @u.
+ * Returns the old value of @v.
*/
-static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u)
+static __inline__ long atomic64_fetch_add_unless(atomic64_t *v, long a, long u)
{
- long c, tmp;
+ long c, new, old;
smp_mb();
__asm__ __volatile__(
- "1: ldq_l %[tmp],%[mem]\n"
- " cmpeq %[tmp],%[u],%[c]\n"
- " addq %[tmp],%[a],%[tmp]\n"
+ "1: ldq_l %[old],%[mem]\n"
+ " cmpeq %[old],%[u],%[c]\n"
+ " addq %[old],%[a],%[new]\n"
" bne %[c],2f\n"
- " stq_c %[tmp],%[mem]\n"
- " beq %[tmp],3f\n"
+ " stq_c %[new],%[mem]\n"
+ " beq %[new],3f\n"
"2:\n"
".subsection 2\n"
"3: br 1b\n"
".previous"
- : [tmp] "=&r"(tmp), [c] "=&r"(c)
+ : [old] "=&r"(old), [new] "=&r"(new), [c] "=&r"(c)
: [mem] "m"(*v), [a] "rI"(a), [u] "rI"(u)
: "memory");
smp_mb();
- return !c;
+ return old;
}
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
/*
* atomic64_dec_if_positive - decrement by 1 if old value positive
@@ -295,31 +296,6 @@
smp_mb();
return old - 1;
}
-
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
-
-#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
-#define atomic64_add_negative(a, v) (atomic64_add_return((a), (v)) < 0)
-
-#define atomic_dec_return(v) atomic_sub_return(1,(v))
-#define atomic64_dec_return(v) atomic64_sub_return(1,(v))
-
-#define atomic_inc_return(v) atomic_add_return(1,(v))
-#define atomic64_inc_return(v) atomic64_add_return(1,(v))
-
-#define atomic_sub_and_test(i,v) (atomic_sub_return((i), (v)) == 0)
-#define atomic64_sub_and_test(i,v) (atomic64_sub_return((i), (v)) == 0)
-
-#define atomic_inc_and_test(v) (atomic_add_return(1, (v)) == 0)
-#define atomic64_inc_and_test(v) (atomic64_add_return(1, (v)) == 0)
-
-#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
-#define atomic64_dec_and_test(v) (atomic64_sub_return(1, (v)) == 0)
-
-#define atomic_inc(v) atomic_add(1,(v))
-#define atomic64_inc(v) atomic64_add(1,(v))
-
-#define atomic_dec(v) atomic_sub(1,(v))
-#define atomic64_dec(v) atomic64_sub(1,(v))
+#define atomic64_dec_if_positive atomic64_dec_if_positive
#endif /* _ALPHA_ATOMIC_H */
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 1185928..4e00727 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -187,7 +187,8 @@
ATOMIC_OPS(add, +=, add)
ATOMIC_OPS(sub, -=, sub)
-#define atomic_andnot atomic_andnot
+#define atomic_andnot atomic_andnot
+#define atomic_fetch_andnot atomic_fetch_andnot
#undef ATOMIC_OPS
#define ATOMIC_OPS(op, c_op, asm_op) \
@@ -296,8 +297,6 @@
ATOMIC_FETCH_OP(op, c_op, asm_op)
ATOMIC_OPS(and, &=, CTOP_INST_AAND_DI_R2_R2_R3)
-#define atomic_andnot(mask, v) atomic_and(~(mask), (v))
-#define atomic_fetch_andnot(mask, v) atomic_fetch_and(~(mask), (v))
ATOMIC_OPS(or, |=, CTOP_INST_AOR_DI_R2_R2_R3)
ATOMIC_OPS(xor, ^=, CTOP_INST_AXOR_DI_R2_R2_R3)
@@ -308,48 +307,6 @@
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
-/**
- * __atomic_add_unless - add unless the number is a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v
- */
-#define __atomic_add_unless(v, a, u) \
-({ \
- int c, old; \
- \
- /* \
- * Explicit full memory barrier needed before/after as \
- * LLOCK/SCOND thmeselves don't provide any such semantics \
- */ \
- smp_mb(); \
- \
- c = atomic_read(v); \
- while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c)\
- c = old; \
- \
- smp_mb(); \
- \
- c; \
-})
-
-#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
-
-#define atomic_inc(v) atomic_add(1, v)
-#define atomic_dec(v) atomic_sub(1, v)
-
-#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
-#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
-
-#define atomic_add_negative(i, v) (atomic_add_return(i, v) < 0)
-
-
#ifdef CONFIG_GENERIC_ATOMIC64
#include <asm-generic/atomic64.h>
@@ -472,7 +429,8 @@
ATOMIC64_OP_RETURN(op, op1, op2) \
ATOMIC64_FETCH_OP(op, op1, op2)
-#define atomic64_andnot atomic64_andnot
+#define atomic64_andnot atomic64_andnot
+#define atomic64_fetch_andnot atomic64_fetch_andnot
ATOMIC64_OPS(add, add.f, adc)
ATOMIC64_OPS(sub, sub.f, sbc)
@@ -559,53 +517,43 @@
return val;
}
+#define atomic64_dec_if_positive atomic64_dec_if_positive
/**
- * atomic64_add_unless - add unless the number is a given value
+ * atomic64_fetch_add_unless - add unless the number is a given value
* @v: pointer of type atomic64_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
*
- * if (v != u) { v += a; ret = 1} else {ret = 0}
- * Returns 1 iff @v was not @u (i.e. if add actually happened)
+ * Atomically adds @a to @v, if it was not @u.
+ * Returns the old value of @v
*/
-static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
+static inline long long atomic64_fetch_add_unless(atomic64_t *v, long long a,
+ long long u)
{
- long long val;
- int op_done;
+ long long old, temp;
smp_mb();
__asm__ __volatile__(
"1: llockd %0, [%2] \n"
- " mov %1, 1 \n"
" brne %L0, %L4, 2f # continue to add since v != u \n"
" breq.d %H0, %H4, 3f # return since v == u \n"
- " mov %1, 0 \n"
"2: \n"
- " add.f %L0, %L0, %L3 \n"
- " adc %H0, %H0, %H3 \n"
- " scondd %0, [%2] \n"
+ " add.f %L1, %L0, %L3 \n"
+ " adc %H1, %H0, %H3 \n"
+ " scondd %1, [%2] \n"
" bnz 1b \n"
"3: \n"
- : "=&r"(val), "=&r" (op_done)
+ : "=&r"(old), "=&r" (temp)
: "r"(&v->counter), "r"(a), "r"(u)
: "cc"); /* memory clobber comes from smp_mb() */
smp_mb();
- return op_done;
+ return old;
}
-
-#define atomic64_add_negative(a, v) (atomic64_add_return((a), (v)) < 0)
-#define atomic64_inc(v) atomic64_add(1LL, (v))
-#define atomic64_inc_return(v) atomic64_add_return(1LL, (v))
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-#define atomic64_sub_and_test(a, v) (atomic64_sub_return((a), (v)) == 0)
-#define atomic64_dec(v) atomic64_sub(1LL, (v))
-#define atomic64_dec_return(v) atomic64_sub_return(1LL, (v))
-#define atomic64_dec_and_test(v) (atomic64_dec_return((v)) == 0)
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1LL, 0LL)
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
#endif /* !CONFIG_GENERIC_ATOMIC64 */
diff --git a/arch/arc/include/asm/kprobes.h b/arch/arc/include/asm/kprobes.h
index 2e52d18..2c1b479 100644
--- a/arch/arc/include/asm/kprobes.h
+++ b/arch/arc/include/asm/kprobes.h
@@ -45,8 +45,6 @@
struct kprobe_ctlblk {
unsigned int kprobe_status;
- struct pt_regs jprobe_saved_regs;
- char jprobes_stack[MAX_STACK_SIZE];
struct prev_kprobe prev_kprobe;
};
diff --git a/arch/arc/kernel/kprobes.c b/arch/arc/kernel/kprobes.c
index 42b0504..df35d4c 100644
--- a/arch/arc/kernel/kprobes.c
+++ b/arch/arc/kernel/kprobes.c
@@ -225,24 +225,18 @@
/* If we have no pre-handler or it returned 0, we continue with
* normal processing. If we have a pre-handler and it returned
- * non-zero - which is expected from setjmp_pre_handler for
- * jprobe, we return without single stepping and leave that to
- * the break-handler which is invoked by a kprobe from
- * jprobe_return
+ * non-zero - which means user handler setup registers to exit
+ * to another instruction, we must skip the single stepping.
*/
if (!p->pre_handler || !p->pre_handler(p, regs)) {
setup_singlestep(p, regs);
kcb->kprobe_status = KPROBE_HIT_SS;
+ } else {
+ reset_current_kprobe();
+ preempt_enable_no_resched();
}
return 1;
- } else if (kprobe_running()) {
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- setup_singlestep(p, regs);
- kcb->kprobe_status = KPROBE_HIT_SS;
- return 1;
- }
}
/* no_kprobe: */
@@ -386,38 +380,6 @@
return ret;
}
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long sp_addr = regs->sp;
-
- kcb->jprobe_saved_regs = *regs;
- memcpy(kcb->jprobes_stack, (void *)sp_addr, MIN_STACK_SIZE(sp_addr));
- regs->ret = (unsigned long)(jp->entry);
-
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- __asm__ __volatile__("unimp_s");
- return;
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long sp_addr;
-
- *regs = kcb->jprobe_saved_regs;
- sp_addr = regs->sp;
- memcpy((void *)sp_addr, kcb->jprobes_stack, MIN_STACK_SIZE(sp_addr));
- preempt_enable_no_resched();
-
- return 1;
-}
-
static void __used kretprobe_trampoline_holder(void)
{
__asm__ __volatile__(".global kretprobe_trampoline\n"
@@ -483,9 +445,7 @@
kretprobe_assert(ri, orig_ret_address, trampoline_address);
regs->ret = orig_ret_address;
- reset_current_kprobe();
kretprobe_hash_unlock(current, &flags);
- preempt_enable_no_resched();
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 843edfd..0f328d6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -9,6 +9,7 @@
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KCOV
+ select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_SET_MEMORY
@@ -337,8 +338,8 @@
select TIMER_OF
select COMMON_CLK
select GENERIC_CLOCKEVENTS
+ select GENERIC_IRQ_MULTI_HANDLER
select MIGHT_HAVE_PCI
- select MULTI_IRQ_HANDLER
select PCI_DOMAINS if PCI
select SPARSE_IRQ
select USE_OF
@@ -465,9 +466,9 @@
bool "Marvell Dove"
select CPU_PJ4
select GENERIC_CLOCKEVENTS
+ select GENERIC_IRQ_MULTI_HANDLER
select GPIOLIB
select MIGHT_HAVE_PCI
- select MULTI_IRQ_HANDLER
select MVEBU_MBUS
select PINCTRL
select PINCTRL_DOVE
@@ -512,8 +513,8 @@
select COMMON_CLK
select CPU_ARM926T
select GENERIC_CLOCKEVENTS
+ select GENERIC_IRQ_MULTI_HANDLER
select GPIOLIB
- select MULTI_IRQ_HANDLER
select SPARSE_IRQ
select USE_OF
help
@@ -532,11 +533,11 @@
select TIMER_OF
select CPU_XSCALE if !CPU_XSC3
select GENERIC_CLOCKEVENTS
+ select GENERIC_IRQ_MULTI_HANDLER
select GPIO_PXA
select GPIOLIB
select HAVE_IDE
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
select PLAT_PXA
select SPARSE_IRQ
help
@@ -572,11 +573,11 @@
select CPU_FREQ
select CPU_SA1100
select GENERIC_CLOCKEVENTS
+ select GENERIC_IRQ_MULTI_HANDLER
select GPIOLIB
select HAVE_IDE
select IRQ_DOMAIN
select ISA
- select MULTI_IRQ_HANDLER
select NEED_MACH_MEMORY_H
select SPARSE_IRQ
help
@@ -590,10 +591,10 @@
select GENERIC_CLOCKEVENTS
select GPIO_SAMSUNG
select GPIOLIB
+ select GENERIC_IRQ_MULTI_HANDLER
select HAVE_S3C2410_I2C if I2C
select HAVE_S3C2410_WATCHDOG if WATCHDOG
select HAVE_S3C_RTC if RTC_CLASS
- select MULTI_IRQ_HANDLER
select NEED_MACH_IO_H
select SAMSUNG_ATAGS
select USE_OF
@@ -627,10 +628,10 @@
select CLKSRC_MMIO
select GENERIC_CLOCKEVENTS
select GENERIC_IRQ_CHIP
+ select GENERIC_IRQ_MULTI_HANDLER
select GPIOLIB
select HAVE_IDE
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
select NEED_MACH_IO_H if PCCARD
select NEED_MACH_MEMORY_H
select SPARSE_IRQ
@@ -921,11 +922,6 @@
Enable support for iWMMXt context switching at run time if
running on a CPU that supports it.
-config MULTI_IRQ_HANDLER
- bool
- help
- Allow each machine to specify it's own IRQ handler at run time.
-
if !MMU
source "arch/arm/Kconfig-nommu"
endif
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index fc26c3d..62ebeae 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -46,12 +46,12 @@
KBUILD_CPPFLAGS += -mbig-endian
CHECKFLAGS += -D__ARMEB__
AS += -EB
-LD += -EB
+LDFLAGS += -EB
else
KBUILD_CPPFLAGS += -mlittle-endian
CHECKFLAGS += -D__ARMEL__
AS += -EL
-LD += -EL
+LDFLAGS += -EL
endif
#
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 0cd4dcc..b17ee03 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -460,6 +460,10 @@
adds \tmp, \addr, #\size - 1
sbcccs \tmp, \tmp, \limit
bcs \bad
+#ifdef CONFIG_CPU_SPECTRE
+ movcs \addr, #0
+ csdb
+#endif
#endif
.endm
diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index 66d0e21..f747566 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -130,7 +130,7 @@
}
#define atomic_cmpxchg_relaxed atomic_cmpxchg_relaxed
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
+static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int oldval, newval;
unsigned long tmp;
@@ -156,6 +156,7 @@
return oldval;
}
+#define atomic_fetch_add_unless atomic_fetch_add_unless
#else /* ARM_ARCH_6 */
@@ -215,15 +216,7 @@
return ret;
}
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
-
- c = atomic_read(v);
- while (c != u && (old = atomic_cmpxchg((v), c, c + a)) != c)
- c = old;
- return c;
-}
+#define atomic_fetch_andnot atomic_fetch_andnot
#endif /* __LINUX_ARM_ARCH__ */
@@ -254,17 +247,6 @@
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
-#define atomic_inc(v) atomic_add(1, v)
-#define atomic_dec(v) atomic_sub(1, v)
-
-#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
-#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
-#define atomic_inc_return_relaxed(v) (atomic_add_return_relaxed(1, v))
-#define atomic_dec_return_relaxed(v) (atomic_sub_return_relaxed(1, v))
-#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
-
-#define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)
-
#ifndef CONFIG_GENERIC_ATOMIC64
typedef struct {
long long counter;
@@ -494,12 +476,13 @@
return result;
}
+#define atomic64_dec_if_positive atomic64_dec_if_positive
-static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
+static inline long long atomic64_fetch_add_unless(atomic64_t *v, long long a,
+ long long u)
{
- long long val;
+ long long oldval, newval;
unsigned long tmp;
- int ret = 1;
smp_mb();
prefetchw(&v->counter);
@@ -508,33 +491,23 @@
"1: ldrexd %0, %H0, [%4]\n"
" teq %0, %5\n"
" teqeq %H0, %H5\n"
-" moveq %1, #0\n"
" beq 2f\n"
-" adds %Q0, %Q0, %Q6\n"
-" adc %R0, %R0, %R6\n"
-" strexd %2, %0, %H0, [%4]\n"
+" adds %Q1, %Q0, %Q6\n"
+" adc %R1, %R0, %R6\n"
+" strexd %2, %1, %H1, [%4]\n"
" teq %2, #0\n"
" bne 1b\n"
"2:"
- : "=&r" (val), "+r" (ret), "=&r" (tmp), "+Qo" (v->counter)
+ : "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
: "r" (&v->counter), "r" (u), "r" (a)
: "cc");
- if (ret)
+ if (oldval != u)
smp_mb();
- return ret;
+ return oldval;
}
-
-#define atomic64_add_negative(a, v) (atomic64_add_return((a), (v)) < 0)
-#define atomic64_inc(v) atomic64_add(1LL, (v))
-#define atomic64_inc_return_relaxed(v) atomic64_add_return_relaxed(1LL, (v))
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-#define atomic64_sub_and_test(a, v) (atomic64_sub_return((a), (v)) == 0)
-#define atomic64_dec(v) atomic64_sub(1LL, (v))
-#define atomic64_dec_return_relaxed(v) atomic64_sub_return_relaxed(1LL, (v))
-#define atomic64_dec_and_test(v) (atomic64_dec_return((v)) == 0)
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1LL, 0LL)
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
#endif /* !CONFIG_GENERIC_ATOMIC64 */
#endif
diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
index 4cab9bb..c92e42a 100644
--- a/arch/arm/include/asm/bitops.h
+++ b/arch/arm/include/asm/bitops.h
@@ -215,7 +215,6 @@
#if __LINUX_ARM_ARCH__ < 5
-#include <asm-generic/bitops/ffz.h>
#include <asm-generic/bitops/__fls.h>
#include <asm-generic/bitops/__ffs.h>
#include <asm-generic/bitops/fls.h>
@@ -223,93 +222,20 @@
#else
-static inline int constant_fls(int x)
-{
- int r = 32;
-
- if (!x)
- return 0;
- if (!(x & 0xffff0000u)) {
- x <<= 16;
- r -= 16;
- }
- if (!(x & 0xff000000u)) {
- x <<= 8;
- r -= 8;
- }
- if (!(x & 0xf0000000u)) {
- x <<= 4;
- r -= 4;
- }
- if (!(x & 0xc0000000u)) {
- x <<= 2;
- r -= 2;
- }
- if (!(x & 0x80000000u)) {
- x <<= 1;
- r -= 1;
- }
- return r;
-}
-
/*
- * On ARMv5 and above those functions can be implemented around the
- * clz instruction for much better code efficiency. __clz returns
- * the number of leading zeros, zero input will return 32, and
- * 0x80000000 will return 0.
+ * On ARMv5 and above, the gcc built-ins may rely on the clz instruction
+ * and produce optimal inlined code in all cases. On ARMv7 it is even
+ * better by also using the rbit instruction.
*/
-static inline unsigned int __clz(unsigned int x)
-{
- unsigned int ret;
-
- asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
-
- return ret;
-}
-
-/*
- * fls() returns zero if the input is zero, otherwise returns the bit
- * position of the last set bit, where the LSB is 1 and MSB is 32.
- */
-static inline int fls(int x)
-{
- if (__builtin_constant_p(x))
- return constant_fls(x);
-
- return 32 - __clz(x);
-}
-
-/*
- * __fls() returns the bit position of the last bit set, where the
- * LSB is 0 and MSB is 31. Zero input is undefined.
- */
-static inline unsigned long __fls(unsigned long x)
-{
- return fls(x) - 1;
-}
-
-/*
- * ffs() returns zero if the input was zero, otherwise returns the bit
- * position of the first set bit, where the LSB is 1 and MSB is 32.
- */
-static inline int ffs(int x)
-{
- return fls(x & -x);
-}
-
-/*
- * __ffs() returns the bit position of the first bit set, where the
- * LSB is 0 and MSB is 31. Zero input is undefined.
- */
-static inline unsigned long __ffs(unsigned long x)
-{
- return ffs(x) - 1;
-}
-
-#define ffz(x) __ffs( ~(x) )
+#include <asm-generic/bitops/builtin-__fls.h>
+#include <asm-generic/bitops/builtin-__ffs.h>
+#include <asm-generic/bitops/builtin-fls.h>
+#include <asm-generic/bitops/builtin-ffs.h>
#endif
+#include <asm-generic/bitops/ffz.h>
+
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 17f1f1a..38badaa 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -58,6 +58,9 @@
#define efi_call_runtime(f, ...) sys_table_arg->runtime->f(__VA_ARGS__)
#define efi_is_64bit() (false)
+#define efi_table_attr(table, attr, instance) \
+ ((table##_t *)instance)->attr
+
#define efi_call_proto(protocol, f, instance, ...) \
((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h
index e46e4e7..ac54c06 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -111,14 +111,17 @@
asm volatile("mcr p14, 0, %0, " #N "," #M ", " #OP2 : : "r" (VAL));\
} while (0)
+struct perf_event_attr;
struct notifier_block;
struct perf_event;
struct pmu;
extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
int *gen_len, int *gen_type);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw);
extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index b6f3196..c883fcb 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -31,11 +31,6 @@
void handle_IRQ(unsigned int, struct pt_regs *);
void init_IRQ(void);
-#ifdef CONFIG_MULTI_IRQ_HANDLER
-extern void (*handle_arch_irq)(struct pt_regs *);
-extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
-#endif
-
#ifdef CONFIG_SMP
extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
bool exclude_self);
diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h
index 5965545..82290f2 100644
--- a/arch/arm/include/asm/kprobes.h
+++ b/arch/arm/include/asm/kprobes.h
@@ -44,8 +44,6 @@
struct kprobe_ctlblk {
unsigned int kprobe_status;
struct prev_kprobe prev_kprobe;
- struct pt_regs jprobe_saved_regs;
- char jprobes_stack[MAX_STACK_SIZE];
};
void arch_remove_kprobe(struct kprobe *);
diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 5c1ad11..bb88512 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -59,7 +59,7 @@
void (*init_time)(void);
void (*init_machine)(void);
void (*init_late)(void);
-#ifdef CONFIG_MULTI_IRQ_HANDLER
+#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
void (*handle_irq)(struct pt_regs *);
#endif
void (*restart)(enum reboot_mode, const char *);
diff --git a/arch/arm/include/asm/mach/time.h b/arch/arm/include/asm/mach/time.h
index 0f79e4d..4ac3a01 100644
--- a/arch/arm/include/asm/mach/time.h
+++ b/arch/arm/include/asm/mach/time.h
@@ -13,7 +13,6 @@
extern void timer_tick(void);
typedef void (*clock_access_fn)(struct timespec64 *);
-extern int register_persistent_clock(clock_access_fn read_boot,
- clock_access_fn read_persistent);
+extern int register_persistent_clock(clock_access_fn read_persistent);
#endif
diff --git a/arch/arm/include/asm/probes.h b/arch/arm/include/asm/probes.h
index 1e5b9bb..991c912 100644
--- a/arch/arm/include/asm/probes.h
+++ b/arch/arm/include/asm/probes.h
@@ -51,7 +51,6 @@
* We assume one instruction can consume at most 64 bytes stack, which is
* 'push {r0-r15}'. Instructions consume more or unknown stack space like
* 'str r0, [sp, #-80]' and 'str r0, [sp, r1]' should be prohibit to probe.
- * Both kprobe and jprobe use this macro.
*/
#define MAX_STACK_SIZE 64
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index e71cc35..9b37b6a 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -123,8 +123,8 @@
extern int vfp_preserve_user_clear_hwstate(struct user_vfp __user *,
struct user_vfp_exc __user *);
-extern int vfp_restore_user_hwstate(struct user_vfp __user *,
- struct user_vfp_exc __user *);
+extern int vfp_restore_user_hwstate(struct user_vfp *,
+ struct user_vfp_exc *);
#endif
/*
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index d5562f9..f854148 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -292,5 +292,13 @@
{
}
+static inline void tlb_flush_remove_tables(struct mm_struct *mm)
+{
+}
+
+static inline void tlb_flush_remove_tables_local(void *arg)
+{
+}
+
#endif /* CONFIG_MMU */
#endif
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 3d614e9..5451e1f 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -85,6 +85,13 @@
flag; })
/*
+ * This is a type: either unsigned long, if the argument fits into
+ * that type, or otherwise unsigned long long.
+ */
+#define __inttype(x) \
+ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
+
+/*
* Single-value transfer routines. They automatically use the right
* size if we just have the right pointer type. Note that the functions
* which read from user space (*get_*) need to take care not to leak
@@ -153,7 +160,7 @@
({ \
unsigned long __limit = current_thread_info()->addr_limit - 1; \
register typeof(*(p)) __user *__p asm("r0") = (p); \
- register typeof(x) __r2 asm("r2"); \
+ register __inttype(x) __r2 asm("r2"); \
register unsigned long __l asm("r1") = __limit; \
register int __e asm("r0"); \
unsigned int __ua_flags = uaccess_save_and_enable(); \
@@ -243,6 +250,16 @@
#define user_addr_max() \
(uaccess_kernel() ? ~0UL : get_fs())
+#ifdef CONFIG_CPU_SPECTRE
+/*
+ * When mitigating Spectre variant 1, it is not worth fixing the non-
+ * verifying accessors, because we need to add verification of the
+ * address space there. Force these to use the standard get_user()
+ * version instead.
+ */
+#define __get_user(x, ptr) get_user(x, ptr)
+#else
+
/*
* The "__xxx" versions of the user access functions do not verify the
* address space - it must have been done previously with a separate
@@ -259,12 +276,6 @@
__gu_err; \
})
-#define __get_user_error(x, ptr, err) \
-({ \
- __get_user_err((x), (ptr), err); \
- (void) 0; \
-})
-
#define __get_user_err(x, ptr, err) \
do { \
unsigned long __gu_addr = (unsigned long)(ptr); \
@@ -324,6 +335,7 @@
#define __get_user_asm_word(x, addr, err) \
__get_user_asm(x, addr, err, ldr)
+#endif
#define __put_user_switch(x, ptr, __err, __fn) \
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 179a9f6..e85a3af 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -22,7 +22,7 @@
#include <asm/glue-df.h>
#include <asm/glue-pf.h>
#include <asm/vfpmacros.h>
-#ifndef CONFIG_MULTI_IRQ_HANDLER
+#ifndef CONFIG_GENERIC_IRQ_MULTI_HANDLER
#include <mach/entry-macro.S>
#endif
#include <asm/thread_notify.h>
@@ -39,7 +39,7 @@
* Interrupt handling.
*/
.macro irq_handler
-#ifdef CONFIG_MULTI_IRQ_HANDLER
+#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
ldr r1, =handle_arch_irq
mov r0, sp
badr lr, 9997f
@@ -1226,9 +1226,3 @@
.globl cr_alignment
cr_alignment:
.space 4
-
-#ifdef CONFIG_MULTI_IRQ_HANDLER
- .globl handle_arch_irq
-handle_arch_irq:
- .space 4
-#endif
diff --git a/arch/arm/kernel/head-nommu.S b/arch/arm/kernel/head-nommu.S
index 7a9b869..ec29de2 100644
--- a/arch/arm/kernel/head-nommu.S
+++ b/arch/arm/kernel/head-nommu.S
@@ -53,7 +53,11 @@
THUMB(1: )
#endif
- setmode PSR_F_BIT | PSR_I_BIT | SVC_MODE, r9 @ ensure svc mode
+#ifdef CONFIG_ARM_VIRT_EXT
+ bl __hyp_stub_install
+#endif
+ @ ensure svc mode and all interrupts masked
+ safe_svcmode_maskall r9
@ and irqs disabled
#if defined(CONFIG_CPU_CP15)
mrc p15, 0, r9, c0, c0 @ get processor id
@@ -89,7 +93,11 @@
* the processor type - there is no need to check the machine type
* as it has already been validated by the primary processor.
*/
- setmode PSR_F_BIT | PSR_I_BIT | SVC_MODE, r9
+#ifdef CONFIG_ARM_VIRT_EXT
+ bl __hyp_stub_install_secondary
+#endif
+ safe_svcmode_maskall r9
+
#ifndef CONFIG_CPU_CP15
ldr r9, =CONFIG_PROCESSOR_ID
#else
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index 629e251..1d5fbf1 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -456,14 +456,13 @@
/*
* Check whether bp virtual address is in kernel space.
*/
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
{
unsigned int len;
unsigned long va;
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
- va = info->address;
- len = get_hbp_len(info->ctrl.len);
+ va = hw->address;
+ len = get_hbp_len(hw->ctrl.len);
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
}
@@ -518,42 +517,42 @@
/*
* Construct an arch_hw_breakpoint from a perf_event.
*/
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
- switch (bp->attr.bp_type) {
+ switch (attr->bp_type) {
case HW_BREAKPOINT_X:
- info->ctrl.type = ARM_BREAKPOINT_EXECUTE;
+ hw->ctrl.type = ARM_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
- info->ctrl.type = ARM_BREAKPOINT_LOAD;
+ hw->ctrl.type = ARM_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
- info->ctrl.type = ARM_BREAKPOINT_STORE;
+ hw->ctrl.type = ARM_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
- info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
+ hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
/* Len */
- switch (bp->attr.bp_len) {
+ switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
- info->ctrl.len = ARM_BREAKPOINT_LEN_1;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
- info->ctrl.len = ARM_BREAKPOINT_LEN_2;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
- info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_8:
- info->ctrl.len = ARM_BREAKPOINT_LEN_8;
- if ((info->ctrl.type != ARM_BREAKPOINT_EXECUTE)
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_8;
+ if ((hw->ctrl.type != ARM_BREAKPOINT_EXECUTE)
&& max_watchpoint_len >= 8)
break;
default:
@@ -566,24 +565,24 @@
* by the hardware and must be aligned to the appropriate number of
* bytes.
*/
- if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE &&
- info->ctrl.len != ARM_BREAKPOINT_LEN_2 &&
- info->ctrl.len != ARM_BREAKPOINT_LEN_4)
+ if (hw->ctrl.type == ARM_BREAKPOINT_EXECUTE &&
+ hw->ctrl.len != ARM_BREAKPOINT_LEN_2 &&
+ hw->ctrl.len != ARM_BREAKPOINT_LEN_4)
return -EINVAL;
/* Address */
- info->address = bp->attr.bp_addr;
+ hw->address = attr->bp_addr;
/* Privilege */
- info->ctrl.privilege = ARM_BREAKPOINT_USER;
- if (arch_check_bp_in_kernelspace(bp))
- info->ctrl.privilege |= ARM_BREAKPOINT_PRIV;
+ hw->ctrl.privilege = ARM_BREAKPOINT_USER;
+ if (arch_check_bp_in_kernelspace(hw))
+ hw->ctrl.privilege |= ARM_BREAKPOINT_PRIV;
/* Enabled? */
- info->ctrl.enabled = !bp->attr.disabled;
+ hw->ctrl.enabled = !attr->disabled;
/* Mismatch */
- info->ctrl.mismatch = 0;
+ hw->ctrl.mismatch = 0;
return 0;
}
@@ -591,9 +590,10 @@
/*
* Validate the arch-specific HW Breakpoint register settings.
*/
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
int ret = 0;
u32 offset, alignment_mask = 0x3;
@@ -602,14 +602,14 @@
return -ENODEV;
/* Build the arch_hw_breakpoint. */
- ret = arch_build_bp_info(bp);
+ ret = arch_build_bp_info(bp, attr, hw);
if (ret)
goto out;
/* Check address alignment. */
- if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
+ if (hw->ctrl.len == ARM_BREAKPOINT_LEN_8)
alignment_mask = 0x7;
- offset = info->address & alignment_mask;
+ offset = hw->address & alignment_mask;
switch (offset) {
case 0:
/* Aligned */
@@ -617,19 +617,19 @@
case 1:
case 2:
/* Allow halfword watchpoints and breakpoints. */
- if (info->ctrl.len == ARM_BREAKPOINT_LEN_2)
+ if (hw->ctrl.len == ARM_BREAKPOINT_LEN_2)
break;
case 3:
/* Allow single byte watchpoint. */
- if (info->ctrl.len == ARM_BREAKPOINT_LEN_1)
+ if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
break;
default:
ret = -EINVAL;
goto out;
}
- info->address &= ~alignment_mask;
- info->ctrl.len <<= offset;
+ hw->address &= ~alignment_mask;
+ hw->ctrl.len <<= offset;
if (is_default_overflow_handler(bp)) {
/*
@@ -640,7 +640,7 @@
return -EINVAL;
/* We don't allow mismatch breakpoints in kernel space. */
- if (arch_check_bp_in_kernelspace(bp))
+ if (arch_check_bp_in_kernelspace(hw))
return -EPERM;
/*
@@ -655,8 +655,8 @@
* reports them.
*/
if (!debug_exception_updates_fsr() &&
- (info->ctrl.type == ARM_BREAKPOINT_LOAD ||
- info->ctrl.type == ARM_BREAKPOINT_STORE))
+ (hw->ctrl.type == ARM_BREAKPOINT_LOAD ||
+ hw->ctrl.type == ARM_BREAKPOINT_STORE))
return -EINVAL;
}
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index ece04a4..9908dac 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -102,16 +102,6 @@
uniphier_cache_init();
}
-#ifdef CONFIG_MULTI_IRQ_HANDLER
-void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
-{
- if (handle_arch_irq)
- return;
-
- handle_arch_irq = handle_irq;
-}
-#endif
-
#ifdef CONFIG_SPARSE_IRQ
int __init arch_probe_nr_irqs(void)
{
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 35ca494..4c249cb 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -1145,7 +1145,7 @@
reserve_crashkernel();
-#ifdef CONFIG_MULTI_IRQ_HANDLER
+#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
handle_arch_irq = mdesc->handle_irq;
#endif
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index dec130e..b8f766c 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -150,22 +150,18 @@
static int restore_vfp_context(char __user **auxp)
{
- struct vfp_sigframe __user *frame =
- (struct vfp_sigframe __user *)*auxp;
- unsigned long magic;
- unsigned long size;
- int err = 0;
+ struct vfp_sigframe frame;
+ int err;
- __get_user_error(magic, &frame->magic, err);
- __get_user_error(size, &frame->size, err);
-
+ err = __copy_from_user(&frame, *auxp, sizeof(frame));
if (err)
- return -EFAULT;
- if (magic != VFP_MAGIC || size != VFP_STORAGE_SIZE)
+ return err;
+
+ if (frame.magic != VFP_MAGIC || frame.size != VFP_STORAGE_SIZE)
return -EINVAL;
- *auxp += size;
- return vfp_restore_user_hwstate(&frame->ufp, &frame->ufp_exc);
+ *auxp += sizeof(frame);
+ return vfp_restore_user_hwstate(&frame.ufp, &frame.ufp_exc);
}
#endif
@@ -176,6 +172,7 @@
static int restore_sigframe(struct pt_regs *regs, struct sigframe __user *sf)
{
+ struct sigcontext context;
char __user *aux;
sigset_t set;
int err;
@@ -184,23 +181,26 @@
if (err == 0)
set_current_blocked(&set);
- __get_user_error(regs->ARM_r0, &sf->uc.uc_mcontext.arm_r0, err);
- __get_user_error(regs->ARM_r1, &sf->uc.uc_mcontext.arm_r1, err);
- __get_user_error(regs->ARM_r2, &sf->uc.uc_mcontext.arm_r2, err);
- __get_user_error(regs->ARM_r3, &sf->uc.uc_mcontext.arm_r3, err);
- __get_user_error(regs->ARM_r4, &sf->uc.uc_mcontext.arm_r4, err);
- __get_user_error(regs->ARM_r5, &sf->uc.uc_mcontext.arm_r5, err);
- __get_user_error(regs->ARM_r6, &sf->uc.uc_mcontext.arm_r6, err);
- __get_user_error(regs->ARM_r7, &sf->uc.uc_mcontext.arm_r7, err);
- __get_user_error(regs->ARM_r8, &sf->uc.uc_mcontext.arm_r8, err);
- __get_user_error(regs->ARM_r9, &sf->uc.uc_mcontext.arm_r9, err);
- __get_user_error(regs->ARM_r10, &sf->uc.uc_mcontext.arm_r10, err);
- __get_user_error(regs->ARM_fp, &sf->uc.uc_mcontext.arm_fp, err);
- __get_user_error(regs->ARM_ip, &sf->uc.uc_mcontext.arm_ip, err);
- __get_user_error(regs->ARM_sp, &sf->uc.uc_mcontext.arm_sp, err);
- __get_user_error(regs->ARM_lr, &sf->uc.uc_mcontext.arm_lr, err);
- __get_user_error(regs->ARM_pc, &sf->uc.uc_mcontext.arm_pc, err);
- __get_user_error(regs->ARM_cpsr, &sf->uc.uc_mcontext.arm_cpsr, err);
+ err |= __copy_from_user(&context, &sf->uc.uc_mcontext, sizeof(context));
+ if (err == 0) {
+ regs->ARM_r0 = context.arm_r0;
+ regs->ARM_r1 = context.arm_r1;
+ regs->ARM_r2 = context.arm_r2;
+ regs->ARM_r3 = context.arm_r3;
+ regs->ARM_r4 = context.arm_r4;
+ regs->ARM_r5 = context.arm_r5;
+ regs->ARM_r6 = context.arm_r6;
+ regs->ARM_r7 = context.arm_r7;
+ regs->ARM_r8 = context.arm_r8;
+ regs->ARM_r9 = context.arm_r9;
+ regs->ARM_r10 = context.arm_r10;
+ regs->ARM_fp = context.arm_fp;
+ regs->ARM_ip = context.arm_ip;
+ regs->ARM_sp = context.arm_sp;
+ regs->ARM_lr = context.arm_lr;
+ regs->ARM_pc = context.arm_pc;
+ regs->ARM_cpsr = context.arm_cpsr;
+ }
err |= !valid_user_regs(regs);
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 1df21a6..f0dd4b6 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -329,9 +329,11 @@
return -ENOMEM;
err = 0;
for (i = 0; i < nsops; i++) {
- __get_user_error(sops[i].sem_num, &tsops->sem_num, err);
- __get_user_error(sops[i].sem_op, &tsops->sem_op, err);
- __get_user_error(sops[i].sem_flg, &tsops->sem_flg, err);
+ struct oabi_sembuf osb;
+ err |= __copy_from_user(&osb, tsops, sizeof(osb));
+ sops[i].sem_num = osb.sem_num;
+ sops[i].sem_op = osb.sem_op;
+ sops[i].sem_flg = osb.sem_flg;
tsops++;
}
if (timeout) {
diff --git a/arch/arm/kernel/time.c b/arch/arm/kernel/time.c
index cf2701c..078b259 100644
--- a/arch/arm/kernel/time.c
+++ b/arch/arm/kernel/time.c
@@ -83,29 +83,18 @@
}
static clock_access_fn __read_persistent_clock = dummy_clock_access;
-static clock_access_fn __read_boot_clock = dummy_clock_access;
void read_persistent_clock64(struct timespec64 *ts)
{
__read_persistent_clock(ts);
}
-void read_boot_clock64(struct timespec64 *ts)
-{
- __read_boot_clock(ts);
-}
-
-int __init register_persistent_clock(clock_access_fn read_boot,
- clock_access_fn read_persistent)
+int __init register_persistent_clock(clock_access_fn read_persistent)
{
/* Only allow the clockaccess functions to be registered once */
- if (__read_persistent_clock == dummy_clock_access &&
- __read_boot_clock == dummy_clock_access) {
- if (read_boot)
- __read_boot_clock = read_boot;
+ if (__read_persistent_clock == dummy_clock_access) {
if (read_persistent)
__read_persistent_clock = read_persistent;
-
return 0;
}
diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index 7a4b060..a826df3 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -90,6 +90,15 @@
.text
ENTRY(arm_copy_from_user)
+#ifdef CONFIG_CPU_SPECTRE
+ get_thread_info r3
+ ldr r3, [r3, #TI_ADDR_LIMIT]
+ adds ip, r1, r2 @ ip=addr+size
+ sub r3, r3, #1 @ addr_limit - 1
+ cmpcc ip, r3 @ if (addr+size > addr_limit - 1)
+ movcs r1, #0 @ addr = NULL
+ csdb
+#endif
#include "copy_template.S"
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 96a7b6c..b169e58 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -702,7 +702,6 @@
config ARM_VIRT_EXT
bool
- depends on MMU
default y if CPU_V7
help
Enable the kernel to make use of the ARM Virtualization
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index 5dd6c58..7d67c70 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -53,7 +53,8 @@
{
/* Check CPUID Identification Scheme before ID_PFR1 read */
if ((read_cpuid_id() & 0x000f0000) == 0x000f0000)
- return !!cpuid_feature_extract(CPUID_EXT_PFR1, 4);
+ return cpuid_feature_extract(CPUID_EXT_PFR1, 4) ||
+ cpuid_feature_extract(CPUID_EXT_PFR1, 20);
return 0;
}
diff --git a/arch/arm/mm/tcm.h b/arch/arm/mm/tcm.h
index 8015ad4..2410192 100644
--- a/arch/arm/mm/tcm.h
+++ b/arch/arm/mm/tcm.h
@@ -11,7 +11,7 @@
void __init tcm_init(void);
#else
/* No TCM support, just blank inlines to be optimized out */
-inline void tcm_init(void)
+static inline void tcm_init(void)
{
}
#endif
diff --git a/arch/arm/plat-omap/counter_32k.c b/arch/arm/plat-omap/counter_32k.c
index 2438b96..fcc5bfe 100644
--- a/arch/arm/plat-omap/counter_32k.c
+++ b/arch/arm/plat-omap/counter_32k.c
@@ -110,7 +110,7 @@
}
sched_clock_register(omap_32k_read_sched_clock, 32, 32768);
- register_persistent_clock(NULL, omap_read_persistent_clock64);
+ register_persistent_clock(omap_read_persistent_clock64);
pr_info("OMAP clocksource: 32k_counter at 32768 Hz\n");
return 0;
diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index e90cc8a..f8bd523 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -47,9 +47,6 @@
(unsigned long)(addr) + \
(size))
-/* Used as a marker in ARM_pc to note when we're in a jprobe. */
-#define JPROBE_MAGIC_ADDR 0xffffffff
-
DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
@@ -289,8 +286,8 @@
break;
case KPROBE_REENTER:
/* A nested probe was hit in FIQ, it is a BUG */
- pr_warn("Unrecoverable kprobe detected at %p.\n",
- p->addr);
+ pr_warn("Unrecoverable kprobe detected.\n");
+ dump_kprobe(p);
/* fall through */
default:
/* impossible cases */
@@ -303,10 +300,10 @@
/*
* If we have no pre-handler or it returned 0, we
- * continue with normal processing. If we have a
- * pre-handler and it returned non-zero, it prepped
- * for calling the break_handler below on re-entry,
- * so get out doing nothing more here.
+ * continue with normal processing. If we have a
+ * pre-handler and it returned non-zero, it will
+ * modify the execution path and no need to single
+ * stepping. Let's just reset current kprobe and exit.
*/
if (!p->pre_handler || !p->pre_handler(p, regs)) {
kcb->kprobe_status = KPROBE_HIT_SS;
@@ -315,20 +312,9 @@
kcb->kprobe_status = KPROBE_HIT_SSDONE;
p->post_handler(p, regs, 0);
}
- reset_current_kprobe();
}
+ reset_current_kprobe();
}
- } else if (cur) {
- /* We probably hit a jprobe. Call its break handler. */
- if (cur->break_handler && cur->break_handler(cur, regs)) {
- kcb->kprobe_status = KPROBE_HIT_SS;
- singlestep(cur, regs, kcb);
- if (cur->post_handler) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- cur->post_handler(cur, regs, 0);
- }
- }
- reset_current_kprobe();
} else {
/*
* The probe was removed and a race is in progress.
@@ -521,117 +507,6 @@
regs->ARM_lr = (unsigned long)&kretprobe_trampoline;
}
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- long sp_addr = regs->ARM_sp;
- long cpsr;
-
- kcb->jprobe_saved_regs = *regs;
- memcpy(kcb->jprobes_stack, (void *)sp_addr, MIN_STACK_SIZE(sp_addr));
- regs->ARM_pc = (long)jp->entry;
-
- cpsr = regs->ARM_cpsr | PSR_I_BIT;
-#ifdef CONFIG_THUMB2_KERNEL
- /* Set correct Thumb state in cpsr */
- if (regs->ARM_pc & 1)
- cpsr |= PSR_T_BIT;
- else
- cpsr &= ~PSR_T_BIT;
-#endif
- regs->ARM_cpsr = cpsr;
-
- preempt_disable();
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- __asm__ __volatile__ (
- /*
- * Setup an empty pt_regs. Fill SP and PC fields as
- * they're needed by longjmp_break_handler.
- *
- * We allocate some slack between the original SP and start of
- * our fabricated regs. To be precise we want to have worst case
- * covered which is STMFD with all 16 regs so we allocate 2 *
- * sizeof(struct_pt_regs)).
- *
- * This is to prevent any simulated instruction from writing
- * over the regs when they are accessing the stack.
- */
-#ifdef CONFIG_THUMB2_KERNEL
- "sub r0, %0, %1 \n\t"
- "mov sp, r0 \n\t"
-#else
- "sub sp, %0, %1 \n\t"
-#endif
- "ldr r0, ="__stringify(JPROBE_MAGIC_ADDR)"\n\t"
- "str %0, [sp, %2] \n\t"
- "str r0, [sp, %3] \n\t"
- "mov r0, sp \n\t"
- "bl kprobe_handler \n\t"
-
- /*
- * Return to the context saved by setjmp_pre_handler
- * and restored by longjmp_break_handler.
- */
-#ifdef CONFIG_THUMB2_KERNEL
- "ldr lr, [sp, %2] \n\t" /* lr = saved sp */
- "ldrd r0, r1, [sp, %5] \n\t" /* r0,r1 = saved lr,pc */
- "ldr r2, [sp, %4] \n\t" /* r2 = saved psr */
- "stmdb lr!, {r0, r1, r2} \n\t" /* push saved lr and */
- /* rfe context */
- "ldmia sp, {r0 - r12} \n\t"
- "mov sp, lr \n\t"
- "ldr lr, [sp], #4 \n\t"
- "rfeia sp! \n\t"
-#else
- "ldr r0, [sp, %4] \n\t"
- "msr cpsr_cxsf, r0 \n\t"
- "ldmia sp, {r0 - pc} \n\t"
-#endif
- :
- : "r" (kcb->jprobe_saved_regs.ARM_sp),
- "I" (sizeof(struct pt_regs) * 2),
- "J" (offsetof(struct pt_regs, ARM_sp)),
- "J" (offsetof(struct pt_regs, ARM_pc)),
- "J" (offsetof(struct pt_regs, ARM_cpsr)),
- "J" (offsetof(struct pt_regs, ARM_lr))
- : "memory", "cc");
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- long stack_addr = kcb->jprobe_saved_regs.ARM_sp;
- long orig_sp = regs->ARM_sp;
- struct jprobe *jp = container_of(p, struct jprobe, kp);
-
- if (regs->ARM_pc == JPROBE_MAGIC_ADDR) {
- if (orig_sp != stack_addr) {
- struct pt_regs *saved_regs =
- (struct pt_regs *)kcb->jprobe_saved_regs.ARM_sp;
- printk("current sp %lx does not match saved sp %lx\n",
- orig_sp, stack_addr);
- printk("Saved registers for jprobe %p\n", jp);
- show_regs(saved_regs);
- printk("Current registers\n");
- show_regs(regs);
- BUG();
- }
- *regs = kcb->jprobe_saved_regs;
- memcpy((void *)stack_addr, kcb->jprobes_stack,
- MIN_STACK_SIZE(stack_addr));
- preempt_enable_no_resched();
- return 1;
- }
- return 0;
-}
-
int __kprobes arch_trampoline_kprobe(struct kprobe *p)
{
return 0;
diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c
index 14db141..cc237fa 100644
--- a/arch/arm/probes/kprobes/test-core.c
+++ b/arch/arm/probes/kprobes/test-core.c
@@ -1461,7 +1461,6 @@
print_registers(&result_regs);
if (mem) {
- pr_err("current_stack=%p\n", current_stack);
pr_err("expected_memory:\n");
print_memory(expected_memory, mem_size);
pr_err("result_memory:\n");
diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile
index a81404c..94516c4 100644
--- a/arch/arm/vfp/Makefile
+++ b/arch/arm/vfp/Makefile
@@ -8,8 +8,5 @@
# asflags-y := -DDEBUG
KBUILD_AFLAGS :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft)
-LDFLAGS +=--no-warn-mismatch
-obj-y += vfp.o
-
-vfp-$(CONFIG_VFP) += vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
+obj-y += vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 35d0f82..dc7e6b5 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -596,13 +596,11 @@
}
/* Sanitise and restore the current VFP state from the provided structures. */
-int vfp_restore_user_hwstate(struct user_vfp __user *ufp,
- struct user_vfp_exc __user *ufp_exc)
+int vfp_restore_user_hwstate(struct user_vfp *ufp, struct user_vfp_exc *ufp_exc)
{
struct thread_info *thread = current_thread_info();
struct vfp_hard_struct *hwstate = &thread->vfpstate.hard;
unsigned long fpexc;
- int err = 0;
/* Disable VFP to avoid corrupting the new thread state. */
vfp_flush_hwstate(thread);
@@ -611,17 +609,16 @@
* Copy the floating point registers. There can be unused
* registers see asm/hwcap.h for details.
*/
- err |= __copy_from_user(&hwstate->fpregs, &ufp->fpregs,
- sizeof(hwstate->fpregs));
+ memcpy(&hwstate->fpregs, &ufp->fpregs, sizeof(hwstate->fpregs));
/*
* Copy the status and control register.
*/
- __get_user_error(hwstate->fpscr, &ufp->fpscr, err);
+ hwstate->fpscr = ufp->fpscr;
/*
* Sanitise and restore the exception registers.
*/
- __get_user_error(fpexc, &ufp_exc->fpexc, err);
+ fpexc = ufp_exc->fpexc;
/* Ensure the VFP is enabled. */
fpexc |= FPEXC_EN;
@@ -630,10 +627,10 @@
fpexc &= ~(FPEXC_EX | FPEXC_FP2V);
hwstate->fpexc = fpexc;
- __get_user_error(hwstate->fpinst, &ufp_exc->fpinst, err);
- __get_user_error(hwstate->fpinst2, &ufp_exc->fpinst2, err);
+ hwstate->fpinst = ufp_exc->fpinst;
+ hwstate->fpinst2 = ufp_exc->fpinst2;
- return err ? -EFAULT : 0;
+ return 0;
}
/*
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42c090c..3d10119 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -74,6 +74,7 @@
select GENERIC_CPU_AUTOPROBE
select GENERIC_EARLY_IOREMAP
select GENERIC_IDLE_POLL_SETUP
+ select GENERIC_IRQ_MULTI_HANDLER
select GENERIC_IRQ_PROBE
select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
@@ -264,9 +265,6 @@
config ARCH_PROC_KCORE_TEXT
def_bool y
-config MULTI_IRQ_HANDLER
- def_bool y
-
source "init/Kconfig"
source "kernel/Kconfig.freezer"
diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index c0235e0..9bca54d 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -40,17 +40,6 @@
#include <asm/cmpxchg.h>
-#define ___atomic_add_unless(v, a, u, sfx) \
-({ \
- typeof((v)->counter) c, old; \
- \
- c = atomic##sfx##_read(v); \
- while (c != (u) && \
- (old = atomic##sfx##_cmpxchg((v), c, c + (a))) != c) \
- c = old; \
- c; \
- })
-
#define ATOMIC_INIT(i) { (i) }
#define atomic_read(v) READ_ONCE((v)->counter)
@@ -61,21 +50,11 @@
#define atomic_add_return_release atomic_add_return_release
#define atomic_add_return atomic_add_return
-#define atomic_inc_return_relaxed(v) atomic_add_return_relaxed(1, (v))
-#define atomic_inc_return_acquire(v) atomic_add_return_acquire(1, (v))
-#define atomic_inc_return_release(v) atomic_add_return_release(1, (v))
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-
#define atomic_sub_return_relaxed atomic_sub_return_relaxed
#define atomic_sub_return_acquire atomic_sub_return_acquire
#define atomic_sub_return_release atomic_sub_return_release
#define atomic_sub_return atomic_sub_return
-#define atomic_dec_return_relaxed(v) atomic_sub_return_relaxed(1, (v))
-#define atomic_dec_return_acquire(v) atomic_sub_return_acquire(1, (v))
-#define atomic_dec_return_release(v) atomic_sub_return_release(1, (v))
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-
#define atomic_fetch_add_relaxed atomic_fetch_add_relaxed
#define atomic_fetch_add_acquire atomic_fetch_add_acquire
#define atomic_fetch_add_release atomic_fetch_add_release
@@ -119,13 +98,6 @@
cmpxchg_release(&((v)->counter), (old), (new))
#define atomic_cmpxchg(v, old, new) cmpxchg(&((v)->counter), (old), (new))
-#define atomic_inc(v) atomic_add(1, (v))
-#define atomic_dec(v) atomic_sub(1, (v))
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-#define atomic_dec_and_test(v) (atomic_dec_return(v) == 0)
-#define atomic_sub_and_test(i, v) (atomic_sub_return((i), (v)) == 0)
-#define atomic_add_negative(i, v) (atomic_add_return((i), (v)) < 0)
-#define __atomic_add_unless(v, a, u) ___atomic_add_unless(v, a, u,)
#define atomic_andnot atomic_andnot
/*
@@ -140,21 +112,11 @@
#define atomic64_add_return_release atomic64_add_return_release
#define atomic64_add_return atomic64_add_return
-#define atomic64_inc_return_relaxed(v) atomic64_add_return_relaxed(1, (v))
-#define atomic64_inc_return_acquire(v) atomic64_add_return_acquire(1, (v))
-#define atomic64_inc_return_release(v) atomic64_add_return_release(1, (v))
-#define atomic64_inc_return(v) atomic64_add_return(1, (v))
-
#define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
#define atomic64_sub_return_acquire atomic64_sub_return_acquire
#define atomic64_sub_return_release atomic64_sub_return_release
#define atomic64_sub_return atomic64_sub_return
-#define atomic64_dec_return_relaxed(v) atomic64_sub_return_relaxed(1, (v))
-#define atomic64_dec_return_acquire(v) atomic64_sub_return_acquire(1, (v))
-#define atomic64_dec_return_release(v) atomic64_sub_return_release(1, (v))
-#define atomic64_dec_return(v) atomic64_sub_return(1, (v))
-
#define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
#define atomic64_fetch_add_acquire atomic64_fetch_add_acquire
#define atomic64_fetch_add_release atomic64_fetch_add_release
@@ -195,16 +157,9 @@
#define atomic64_cmpxchg_release atomic_cmpxchg_release
#define atomic64_cmpxchg atomic_cmpxchg
-#define atomic64_inc(v) atomic64_add(1, (v))
-#define atomic64_dec(v) atomic64_sub(1, (v))
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-#define atomic64_dec_and_test(v) (atomic64_dec_return(v) == 0)
-#define atomic64_sub_and_test(i, v) (atomic64_sub_return((i), (v)) == 0)
-#define atomic64_add_negative(i, v) (atomic64_add_return((i), (v)) < 0)
-#define atomic64_add_unless(v, a, u) (___atomic_add_unless(v, a, u, 64) != u)
#define atomic64_andnot atomic64_andnot
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
+#define atomic64_dec_if_positive atomic64_dec_if_positive
#endif
#endif
diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 9c19594..10d536b 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -17,22 +17,11 @@
#define __ASM_BITOPS_H
#include <linux/compiler.h>
-#include <asm/barrier.h>
#ifndef _LINUX_BITOPS_H
#error only <linux/bitops.h> can be included directly
#endif
-/*
- * Little endian assembly atomic bitops.
- */
-extern void set_bit(int nr, volatile unsigned long *p);
-extern void clear_bit(int nr, volatile unsigned long *p);
-extern void change_bit(int nr, volatile unsigned long *p);
-extern int test_and_set_bit(int nr, volatile unsigned long *p);
-extern int test_and_clear_bit(int nr, volatile unsigned long *p);
-extern int test_and_change_bit(int nr, volatile unsigned long *p);
-
#include <asm-generic/bitops/builtin-__ffs.h>
#include <asm-generic/bitops/builtin-ffs.h>
#include <asm-generic/bitops/builtin-__fls.h>
@@ -44,15 +33,11 @@
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
-#include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/lock.h>
#include <asm-generic/bitops/non-atomic.h>
#include <asm-generic/bitops/le.h>
-
-/*
- * Ext2 is defined to use little-endian byte ordering.
- */
-#define ext2_set_bit_atomic(lock, nr, p) test_and_set_bit_le(nr, p)
-#define ext2_clear_bit_atomic(lock, nr, p) test_and_clear_bit_le(nr, p)
+#include <asm-generic/bitops/ext2-atomic-setbit.h>
#endif /* __ASM_BITOPS_H */
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 192d791..7ed3208 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -87,6 +87,9 @@
#define efi_call_runtime(f, ...) sys_table_arg->runtime->f(__VA_ARGS__)
#define efi_is_64bit() (true)
+#define efi_table_attr(table, attr, instance) \
+ ((table##_t *)instance)->attr
+
#define efi_call_proto(protocol, f, instance, ...) \
((protocol##_t *)instance)->f(instance, ##__VA_ARGS__)
diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
index 4177076..6a53e59 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -119,13 +119,16 @@
struct task_struct;
struct notifier_block;
+struct perf_event_attr;
struct perf_event;
struct pmu;
extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
int *gen_len, int *gen_type, int *offset);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw);
extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index a0fee69..b2b0c64 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -8,8 +8,6 @@
struct pt_regs;
-extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
-
static inline int nr_legacy_irqs(void)
{
return 0;
diff --git a/arch/arm64/include/asm/kprobes.h b/arch/arm64/include/asm/kprobes.h
index 6deb8d7..d5a44cf 100644
--- a/arch/arm64/include/asm/kprobes.h
+++ b/arch/arm64/include/asm/kprobes.h
@@ -48,7 +48,6 @@
unsigned long saved_irqflag;
struct prev_kprobe prev_kprobe;
struct kprobe_step_ctx ss_ctx;
- struct pt_regs jprobe_saved_regs;
};
void arch_remove_kprobe(struct kprobe *);
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 413dbe5..8c96443 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -343,14 +343,13 @@
/*
* Check whether bp virtual address is in kernel space.
*/
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
{
unsigned int len;
unsigned long va;
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
- va = info->address;
- len = get_hbp_len(info->ctrl.len);
+ va = hw->address;
+ len = get_hbp_len(hw->ctrl.len);
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
}
@@ -421,53 +420,53 @@
/*
* Construct an arch_hw_breakpoint from a perf_event.
*/
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
- switch (bp->attr.bp_type) {
+ switch (attr->bp_type) {
case HW_BREAKPOINT_X:
- info->ctrl.type = ARM_BREAKPOINT_EXECUTE;
+ hw->ctrl.type = ARM_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
- info->ctrl.type = ARM_BREAKPOINT_LOAD;
+ hw->ctrl.type = ARM_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
- info->ctrl.type = ARM_BREAKPOINT_STORE;
+ hw->ctrl.type = ARM_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
- info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
+ hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
/* Len */
- switch (bp->attr.bp_len) {
+ switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
- info->ctrl.len = ARM_BREAKPOINT_LEN_1;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
- info->ctrl.len = ARM_BREAKPOINT_LEN_2;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_3:
- info->ctrl.len = ARM_BREAKPOINT_LEN_3;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_3;
break;
case HW_BREAKPOINT_LEN_4:
- info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_5:
- info->ctrl.len = ARM_BREAKPOINT_LEN_5;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_5;
break;
case HW_BREAKPOINT_LEN_6:
- info->ctrl.len = ARM_BREAKPOINT_LEN_6;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_6;
break;
case HW_BREAKPOINT_LEN_7:
- info->ctrl.len = ARM_BREAKPOINT_LEN_7;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_7;
break;
case HW_BREAKPOINT_LEN_8:
- info->ctrl.len = ARM_BREAKPOINT_LEN_8;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_8;
break;
default:
return -EINVAL;
@@ -478,37 +477,37 @@
* AArch32 also requires breakpoints of length 2 for Thumb.
* Watchpoints can be of length 1, 2, 4 or 8 bytes.
*/
- if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE) {
+ if (hw->ctrl.type == ARM_BREAKPOINT_EXECUTE) {
if (is_compat_bp(bp)) {
- if (info->ctrl.len != ARM_BREAKPOINT_LEN_2 &&
- info->ctrl.len != ARM_BREAKPOINT_LEN_4)
+ if (hw->ctrl.len != ARM_BREAKPOINT_LEN_2 &&
+ hw->ctrl.len != ARM_BREAKPOINT_LEN_4)
return -EINVAL;
- } else if (info->ctrl.len != ARM_BREAKPOINT_LEN_4) {
+ } else if (hw->ctrl.len != ARM_BREAKPOINT_LEN_4) {
/*
* FIXME: Some tools (I'm looking at you perf) assume
* that breakpoints should be sizeof(long). This
* is nonsense. For now, we fix up the parameter
* but we should probably return -EINVAL instead.
*/
- info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+ hw->ctrl.len = ARM_BREAKPOINT_LEN_4;
}
}
/* Address */
- info->address = bp->attr.bp_addr;
+ hw->address = attr->bp_addr;
/*
* Privilege
* Note that we disallow combined EL0/EL1 breakpoints because
* that would complicate the stepping code.
*/
- if (arch_check_bp_in_kernelspace(bp))
- info->ctrl.privilege = AARCH64_BREAKPOINT_EL1;
+ if (arch_check_bp_in_kernelspace(hw))
+ hw->ctrl.privilege = AARCH64_BREAKPOINT_EL1;
else
- info->ctrl.privilege = AARCH64_BREAKPOINT_EL0;
+ hw->ctrl.privilege = AARCH64_BREAKPOINT_EL0;
/* Enabled? */
- info->ctrl.enabled = !bp->attr.disabled;
+ hw->ctrl.enabled = !attr->disabled;
return 0;
}
@@ -516,14 +515,15 @@
/*
* Validate the arch-specific HW Breakpoint register settings.
*/
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
int ret;
u64 alignment_mask, offset;
/* Build the arch_hw_breakpoint. */
- ret = arch_build_bp_info(bp);
+ ret = arch_build_bp_info(bp, attr, hw);
if (ret)
return ret;
@@ -537,42 +537,42 @@
* that here.
*/
if (is_compat_bp(bp)) {
- if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
+ if (hw->ctrl.len == ARM_BREAKPOINT_LEN_8)
alignment_mask = 0x7;
else
alignment_mask = 0x3;
- offset = info->address & alignment_mask;
+ offset = hw->address & alignment_mask;
switch (offset) {
case 0:
/* Aligned */
break;
case 1:
/* Allow single byte watchpoint. */
- if (info->ctrl.len == ARM_BREAKPOINT_LEN_1)
+ if (hw->ctrl.len == ARM_BREAKPOINT_LEN_1)
break;
case 2:
/* Allow halfword watchpoints and breakpoints. */
- if (info->ctrl.len == ARM_BREAKPOINT_LEN_2)
+ if (hw->ctrl.len == ARM_BREAKPOINT_LEN_2)
break;
default:
return -EINVAL;
}
} else {
- if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE)
+ if (hw->ctrl.type == ARM_BREAKPOINT_EXECUTE)
alignment_mask = 0x3;
else
alignment_mask = 0x7;
- offset = info->address & alignment_mask;
+ offset = hw->address & alignment_mask;
}
- info->address &= ~alignment_mask;
- info->ctrl.len <<= offset;
+ hw->address &= ~alignment_mask;
+ hw->ctrl.len <<= offset;
/*
* Disallow per-task kernel breakpoints since these would
* complicate the stepping code.
*/
- if (info->ctrl.privilege == AARCH64_BREAKPOINT_EL1 && bp->hw.target)
+ if (hw->ctrl.privilege == AARCH64_BREAKPOINT_EL1 && bp->hw.target)
return -EINVAL;
return 0;
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 60e5fc6..780a12f 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -42,16 +42,6 @@
return 0;
}
-void (*handle_arch_irq)(struct pt_regs *) = NULL;
-
-void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
-{
- if (handle_arch_irq)
- return;
-
- handle_arch_irq = handle_irq;
-}
-
#ifdef CONFIG_VMAP_STACK
static void init_irq_stacks(void)
{
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d849d98..e78c3ef 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -275,7 +275,7 @@
break;
case KPROBE_HIT_SS:
case KPROBE_REENTER:
- pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr);
+ pr_warn("Unrecoverable kprobe detected.\n");
dump_kprobe(p);
BUG();
break;
@@ -395,9 +395,9 @@
/*
* If we have no pre-handler or it returned 0, we
* continue with normal processing. If we have a
- * pre-handler and it returned non-zero, it prepped
- * for calling the break_handler below on re-entry,
- * so get out doing nothing more here.
+ * pre-handler and it returned non-zero, it will
+ * modify the execution path and no need to single
+ * stepping. Let's just reset current kprobe and exit.
*
* pre_handler can hit a breakpoint and can step thru
* before return, keep PSTATE D-flag enabled until
@@ -405,16 +405,8 @@
*/
if (!p->pre_handler || !p->pre_handler(p, regs)) {
setup_singlestep(p, regs, kcb, 0);
- return;
- }
- }
- } else if ((le32_to_cpu(*(kprobe_opcode_t *) addr) ==
- BRK64_OPCODE_KPROBES) && cur_kprobe) {
- /* We probably hit a jprobe. Call its break handler. */
- if (cur_kprobe->break_handler &&
- cur_kprobe->break_handler(cur_kprobe, regs)) {
- setup_singlestep(cur_kprobe, regs, kcb, 0);
- return;
+ } else
+ reset_current_kprobe();
}
}
/*
@@ -465,74 +457,6 @@
return DBG_HOOK_HANDLED;
}
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- kcb->jprobe_saved_regs = *regs;
- /*
- * Since we can't be sure where in the stack frame "stacked"
- * pass-by-value arguments are stored we just don't try to
- * duplicate any of the stack. Do not use jprobes on functions that
- * use more than 64 bytes (after padding each to an 8 byte boundary)
- * of arguments, or pass individual arguments larger than 16 bytes.
- */
-
- instruction_pointer_set(regs, (unsigned long) jp->entry);
- preempt_disable();
- pause_graph_tracing();
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- /*
- * Jprobe handler return by entering break exception,
- * encoded same as kprobe, but with following conditions
- * -a special PC to identify it from the other kprobes.
- * -restore stack addr to original saved pt_regs
- */
- asm volatile(" mov sp, %0 \n"
- "jprobe_return_break: brk %1 \n"
- :
- : "r" (kcb->jprobe_saved_regs.sp),
- "I" (BRK64_ESR_KPROBES)
- : "memory");
-
- unreachable();
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- long stack_addr = kcb->jprobe_saved_regs.sp;
- long orig_sp = kernel_stack_pointer(regs);
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- extern const char jprobe_return_break[];
-
- if (instruction_pointer(regs) != (u64) jprobe_return_break)
- return 0;
-
- if (orig_sp != stack_addr) {
- struct pt_regs *saved_regs =
- (struct pt_regs *)kcb->jprobe_saved_regs.sp;
- pr_err("current sp %lx does not match saved sp %lx\n",
- orig_sp, stack_addr);
- pr_err("Saved registers for jprobe %p\n", jp);
- __show_regs(saved_regs);
- pr_err("Current registers\n");
- __show_regs(regs);
- BUG();
- }
- unpause_graph_tracing();
- *regs = kcb->jprobe_saved_regs;
- preempt_enable_no_resched();
- return 1;
-}
-
bool arch_within_kprobe_blacklist(unsigned long addr)
{
if ((addr >= (unsigned long)__kprobes_text_start &&
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 137710f..68755fd 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-lib-y := bitops.o clear_user.o delay.o copy_from_user.o \
+lib-y := clear_user.o delay.o copy_from_user.o \
copy_to_user.o copy_in_user.o copy_page.o \
clear_page.o memchr.o memcpy.o memmove.o memset.o \
memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \
diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
deleted file mode 100644
index 43ac736..0000000
--- a/arch/arm64/lib/bitops.S
+++ /dev/null
@@ -1,76 +0,0 @@
-/*
- * Based on arch/arm/lib/bitops.h
- *
- * Copyright (C) 2013 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/lse.h>
-
-/*
- * x0: bits 5:0 bit offset
- * bits 31:6 word offset
- * x1: address
- */
- .macro bitop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x3, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x3, [x1]"
-alt_lse " \llsc x2, x2, x3", "nop"
-alt_lse " stxr w0, x2, [x1]", "nop"
-alt_lse " cbnz w0, 1b", "nop"
-
- ret
-ENDPROC(\name )
- .endm
-
- .macro testop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x4, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x4, x2, [x1]"
- lsr x0, x2, x3
-alt_lse " \llsc x2, x2, x4", "nop"
-alt_lse " stlxr w5, x2, [x1]", "nop"
-alt_lse " cbnz w5, 1b", "nop"
-alt_lse " dmb ish", "nop"
-
- and x0, x0, #1
- ret
-ENDPROC(\name )
- .endm
-
-/*
- * Atomic bit operations.
- */
- bitop change_bit, eor, steor
- bitop clear_bit, bic, stclr
- bitop set_bit, orr, stset
-
- testop test_and_change_bit, eor, ldeoral
- testop test_and_clear_bit, bic, ldclral
- testop test_and_set_bit, orr, ldsetal
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 493ff75..8ae5d7a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -977,12 +977,12 @@
return 1;
}
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return pud_none(*pud);
}
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return pmd_none(*pmd);
}
diff --git a/arch/h8300/include/asm/atomic.h b/arch/h8300/include/asm/atomic.h
index 941e755..c6b6a06 100644
--- a/arch/h8300/include/asm/atomic.h
+++ b/arch/h8300/include/asm/atomic.h
@@ -2,8 +2,10 @@
#ifndef __ARCH_H8300_ATOMIC__
#define __ARCH_H8300_ATOMIC__
+#include <linux/compiler.h>
#include <linux/types.h>
#include <asm/cmpxchg.h>
+#include <asm/irqflags.h>
/*
* Atomic operations that C can't guarantee us. Useful for
@@ -15,8 +17,6 @@
#define atomic_read(v) READ_ONCE((v)->counter)
#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-#include <linux/kernel.h>
-
#define ATOMIC_OP_RETURN(op, c_op) \
static inline int atomic_##op##_return(int i, atomic_t *v) \
{ \
@@ -69,18 +69,6 @@
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
-#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
-#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
-
-#define atomic_inc_return(v) atomic_add_return(1, v)
-#define atomic_dec_return(v) atomic_sub_return(1, v)
-
-#define atomic_inc(v) (void)atomic_inc_return(v)
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-
-#define atomic_dec(v) (void)atomic_dec_return(v)
-#define atomic_dec_and_test(v) (atomic_dec_return(v) == 0)
-
static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
{
int ret;
@@ -94,7 +82,7 @@
return ret;
}
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
+static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int ret;
h8300flags flags;
@@ -106,5 +94,6 @@
arch_local_irq_restore(flags);
return ret;
}
+#define atomic_fetch_add_unless atomic_fetch_add_unless
#endif /* __ARCH_H8300_ATOMIC __ */
diff --git a/arch/hexagon/include/asm/atomic.h b/arch/hexagon/include/asm/atomic.h
index fb3dfb2..311b989 100644
--- a/arch/hexagon/include/asm/atomic.h
+++ b/arch/hexagon/include/asm/atomic.h
@@ -164,7 +164,7 @@
#undef ATOMIC_OP
/**
- * __atomic_add_unless - add unless the number is a given value
+ * atomic_fetch_add_unless - add unless the number is a given value
* @v: pointer to value
* @a: amount to add
* @u: unless value is equal to u
@@ -173,7 +173,7 @@
*
*/
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
+static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int __oldval;
register int tmp;
@@ -196,18 +196,6 @@
);
return __oldval;
}
-
-#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
-
-#define atomic_inc(v) atomic_add(1, (v))
-#define atomic_dec(v) atomic_sub(1, (v))
-
-#define atomic_inc_and_test(v) (atomic_add_return(1, (v)) == 0)
-#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
-#define atomic_sub_and_test(i, v) (atomic_sub_return(i, (v)) == 0)
-#define atomic_add_negative(i, v) (atomic_add_return(i, (v)) < 0)
-
-#define atomic_inc_return(v) (atomic_add_return(1, v))
-#define atomic_dec_return(v) (atomic_sub_return(1, v))
+#define atomic_fetch_add_unless atomic_fetch_add_unless
#endif
diff --git a/arch/ia64/include/asm/atomic.h b/arch/ia64/include/asm/atomic.h
index 2524fb6..206530d 100644
--- a/arch/ia64/include/asm/atomic.h
+++ b/arch/ia64/include/asm/atomic.h
@@ -215,91 +215,10 @@
(cmpxchg(&((v)->counter), old, new))
#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
-static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
-
-
-static __inline__ long atomic64_add_unless(atomic64_t *v, long a, long u)
-{
- long c, old;
- c = atomic64_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic64_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c != (u);
-}
-
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
-
-static __inline__ long atomic64_dec_if_positive(atomic64_t *v)
-{
- long c, old, dec;
- c = atomic64_read(v);
- for (;;) {
- dec = c - 1;
- if (unlikely(dec < 0))
- break;
- old = atomic64_cmpxchg((v), c, dec);
- if (likely(old == c))
- break;
- c = old;
- }
- return dec;
-}
-
-/*
- * Atomically add I to V and return TRUE if the resulting value is
- * negative.
- */
-static __inline__ int
-atomic_add_negative (int i, atomic_t *v)
-{
- return atomic_add_return(i, v) < 0;
-}
-
-static __inline__ long
-atomic64_add_negative (__s64 i, atomic64_t *v)
-{
- return atomic64_add_return(i, v) < 0;
-}
-
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-#define atomic64_dec_return(v) atomic64_sub_return(1, (v))
-#define atomic64_inc_return(v) atomic64_add_return(1, (v))
-
-#define atomic_sub_and_test(i,v) (atomic_sub_return((i), (v)) == 0)
-#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
-#define atomic_inc_and_test(v) (atomic_add_return(1, (v)) == 0)
-#define atomic64_sub_and_test(i,v) (atomic64_sub_return((i), (v)) == 0)
-#define atomic64_dec_and_test(v) (atomic64_sub_return(1, (v)) == 0)
-#define atomic64_inc_and_test(v) (atomic64_add_return(1, (v)) == 0)
-
#define atomic_add(i,v) (void)atomic_add_return((i), (v))
#define atomic_sub(i,v) (void)atomic_sub_return((i), (v))
-#define atomic_inc(v) atomic_add(1, (v))
-#define atomic_dec(v) atomic_sub(1, (v))
#define atomic64_add(i,v) (void)atomic64_add_return((i), (v))
#define atomic64_sub(i,v) (void)atomic64_sub_return((i), (v))
-#define atomic64_inc(v) atomic64_add(1, (v))
-#define atomic64_dec(v) atomic64_sub(1, (v))
#endif /* _ASM_IA64_ATOMIC_H */
diff --git a/arch/ia64/include/asm/kprobes.h b/arch/ia64/include/asm/kprobes.h
index 0302b36..580356a 100644
--- a/arch/ia64/include/asm/kprobes.h
+++ b/arch/ia64/include/asm/kprobes.h
@@ -82,8 +82,6 @@
#define ARCH_PREV_KPROBE_SZ 2
struct kprobe_ctlblk {
unsigned long kprobe_status;
- struct pt_regs jprobe_saved_regs;
- unsigned long jprobes_saved_stacked_regs[MAX_PARAM_RSE_SIZE];
unsigned long *bsp;
unsigned long cfm;
atomic_t prev_kprobe_index;
diff --git a/arch/ia64/include/uapi/asm/break.h b/arch/ia64/include/uapi/asm/break.h
index 5d742bc..4ca110f 100644
--- a/arch/ia64/include/uapi/asm/break.h
+++ b/arch/ia64/include/uapi/asm/break.h
@@ -14,7 +14,6 @@
*/
#define __IA64_BREAK_KDB 0x80100
#define __IA64_BREAK_KPROBE 0x81000 /* .. 0x81fff */
-#define __IA64_BREAK_JPROBE 0x82000
/*
* OS-specific break numbers:
diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile
index 498f3da..d0c0ccd 100644
--- a/arch/ia64/kernel/Makefile
+++ b/arch/ia64/kernel/Makefile
@@ -25,7 +25,7 @@
obj-$(CONFIG_PERFMON) += perfmon_default_smpl.o
obj-$(CONFIG_IA64_CYCLONE) += cyclone.o
obj-$(CONFIG_IA64_MCA_RECOVERY) += mca_recovery.o
-obj-$(CONFIG_KPROBES) += kprobes.o jprobes.o
+obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
diff --git a/arch/ia64/kernel/jprobes.S b/arch/ia64/kernel/jprobes.S
deleted file mode 100644
index f69389c..0000000
--- a/arch/ia64/kernel/jprobes.S
+++ /dev/null
@@ -1,90 +0,0 @@
-/*
- * Jprobe specific operations
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) Intel Corporation, 2005
- *
- * 2005-May Rusty Lynch <rusty.lynch@intel.com> and Anil S Keshavamurthy
- * <anil.s.keshavamurthy@intel.com> initial implementation
- *
- * Jprobes (a.k.a. "jump probes" which is built on-top of kprobes) allow a
- * probe to be inserted into the beginning of a function call. The fundamental
- * difference between a jprobe and a kprobe is the jprobe handler is executed
- * in the same context as the target function, while the kprobe handlers
- * are executed in interrupt context.
- *
- * For jprobes we initially gain control by placing a break point in the
- * first instruction of the targeted function. When we catch that specific
- * break, we:
- * * set the return address to our jprobe_inst_return() function
- * * jump to the jprobe handler function
- *
- * Since we fixed up the return address, the jprobe handler will return to our
- * jprobe_inst_return() function, giving us control again. At this point we
- * are back in the parents frame marker, so we do yet another call to our
- * jprobe_break() function to fix up the frame marker as it would normally
- * exist in the target function.
- *
- * Our jprobe_return function then transfers control back to kprobes.c by
- * executing a break instruction using one of our reserved numbers. When we
- * catch that break in kprobes.c, we continue like we do for a normal kprobe
- * by single stepping the emulated instruction, and then returning execution
- * to the correct location.
- */
-#include <asm/asmmacro.h>
-#include <asm/break.h>
-
- /*
- * void jprobe_break(void)
- */
- .section .kprobes.text, "ax"
-ENTRY(jprobe_break)
- break.m __IA64_BREAK_JPROBE
-END(jprobe_break)
-
- /*
- * void jprobe_inst_return(void)
- */
-GLOBAL_ENTRY(jprobe_inst_return)
- br.call.sptk.many b0=jprobe_break
-END(jprobe_inst_return)
-
-GLOBAL_ENTRY(invalidate_stacked_regs)
- movl r16=invalidate_restore_cfm
- ;;
- mov b6=r16
- ;;
- br.ret.sptk.many b6
- ;;
-invalidate_restore_cfm:
- mov r16=ar.rsc
- ;;
- mov ar.rsc=r0
- ;;
- loadrs
- ;;
- mov ar.rsc=r16
- ;;
- br.cond.sptk.many rp
-END(invalidate_stacked_regs)
-
-GLOBAL_ENTRY(flush_register_stack)
- // flush dirty regs to backing store (must be first in insn group)
- flushrs
- ;;
- br.ret.sptk.many rp
-END(flush_register_stack)
-
diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index f5f3a5e..aa41bd5 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -35,8 +35,6 @@
#include <asm/sections.h>
#include <asm/exception.h>
-extern void jprobe_inst_return(void);
-
DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
@@ -480,12 +478,9 @@
*/
break;
}
-
kretprobe_assert(ri, orig_ret_address, trampoline_address);
- reset_current_kprobe();
kretprobe_hash_unlock(current, &flags);
- preempt_enable_no_resched();
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
@@ -819,14 +814,6 @@
prepare_ss(p, regs);
kcb->kprobe_status = KPROBE_REENTER;
return 1;
- } else if (args->err == __IA64_BREAK_JPROBE) {
- /*
- * jprobe instrumented function just completed
- */
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- goto ss_probe;
- }
} else if (!is_ia64_break_inst(regs)) {
/* The breakpoint instruction was removed by
* another cpu right after we hit, no further
@@ -861,15 +848,12 @@
set_current_kprobe(p, kcb);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- if (p->pre_handler && p->pre_handler(p, regs))
- /*
- * Our pre-handler is specifically requesting that we just
- * do a return. This is used for both the jprobe pre-handler
- * and the kretprobe trampoline
- */
+ if (p->pre_handler && p->pre_handler(p, regs)) {
+ reset_current_kprobe();
+ preempt_enable_no_resched();
return 1;
+ }
-ss_probe:
#if !defined(CONFIG_PREEMPT)
if (p->ainsn.inst_flag == INST_FLAG_BOOSTABLE && !p->post_handler) {
/* Boost up -- we can execute copied instructions directly */
@@ -992,7 +976,6 @@
case DIE_BREAK:
/* err is break number from ia64_bad_break() */
if ((args->err >> 12) == (__IA64_BREAK_KPROBE >> 12)
- || args->err == __IA64_BREAK_JPROBE
|| args->err == 0)
if (pre_kprobes_handler(args))
ret = NOTIFY_STOP;
@@ -1040,74 +1023,6 @@
return ((struct fnptr *)entry)->ip;
}
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- unsigned long addr = arch_deref_entry_point(jp->entry);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- struct param_bsp_cfm pa;
- int bytes;
-
- /*
- * Callee owns the argument space and could overwrite it, eg
- * tail call optimization. So to be absolutely safe
- * we save the argument space before transferring the control
- * to instrumented jprobe function which runs in
- * the process context
- */
- pa.ip = regs->cr_iip;
- unw_init_running(ia64_get_bsp_cfm, &pa);
- bytes = (char *)ia64_rse_skip_regs(pa.bsp, pa.cfm & 0x3f)
- - (char *)pa.bsp;
- memcpy( kcb->jprobes_saved_stacked_regs,
- pa.bsp,
- bytes );
- kcb->bsp = pa.bsp;
- kcb->cfm = pa.cfm;
-
- /* save architectural state */
- kcb->jprobe_saved_regs = *regs;
-
- /* after rfi, execute the jprobe instrumented function */
- regs->cr_iip = addr & ~0xFULL;
- ia64_psr(regs)->ri = addr & 0xf;
- regs->r1 = ((struct fnptr *)(jp->entry))->gp;
-
- /*
- * fix the return address to our jprobe_inst_return() function
- * in the jprobes.S file
- */
- regs->b0 = ((struct fnptr *)(jprobe_inst_return))->ip;
-
- return 1;
-}
-
-/* ia64 does not need this */
-void __kprobes jprobe_return(void)
-{
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- int bytes;
-
- /* restoring architectural state */
- *regs = kcb->jprobe_saved_regs;
-
- /* restoring the original argument space */
- flush_register_stack();
- bytes = (char *)ia64_rse_skip_regs(kcb->bsp, kcb->cfm & 0x3f)
- - (char *)kcb->bsp;
- memcpy( kcb->bsp,
- kcb->jprobes_saved_stacked_regs,
- bytes );
- invalidate_stacked_regs();
-
- preempt_enable_no_resched();
- return 1;
-}
-
static struct kprobe trampoline_p = {
.pre_handler = trampoline_probe_handler
};
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 785612b..b29f937 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -2,6 +2,7 @@
config M68K
bool
default y
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE if HAS_DMA
select ARCH_MIGHT_HAVE_PC_PARPORT if ISA
select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select HAVE_IDE
@@ -24,6 +25,10 @@
select MODULES_USE_ELF_RELA
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select DMA_NONCOHERENT_OPS if HAS_DMA
+ select HAVE_MEMBLOCK
+ select ARCH_DISCARD_MEMBLOCK
+ select NO_BOOTMEM
config CPU_BIG_ENDIAN
def_bool y
diff --git a/arch/m68k/apollo/config.c b/arch/m68k/apollo/config.c
index b2a6bc6..aef8d42 100644
--- a/arch/m68k/apollo/config.c
+++ b/arch/m68k/apollo/config.c
@@ -31,7 +31,6 @@
extern void dn_init_IRQ(void);
extern u32 dn_gettimeoffset(void);
extern int dn_dummy_hwclk(int, struct rtc_time *);
-extern int dn_dummy_set_clock_mmss(unsigned long);
extern void dn_dummy_reset(void);
#ifdef CONFIG_HEARTBEAT
static void dn_heartbeat(int on);
@@ -156,7 +155,6 @@
arch_gettimeoffset = dn_gettimeoffset;
mach_max_dma_address = 0xffffffff;
mach_hwclk = dn_dummy_hwclk; /* */
- mach_set_clock_mmss = dn_dummy_set_clock_mmss; /* */
mach_reset = dn_dummy_reset; /* */
#ifdef CONFIG_HEARTBEAT
mach_heartbeat = dn_heartbeat;
@@ -240,12 +238,6 @@
}
-int dn_dummy_set_clock_mmss(unsigned long nowtime)
-{
- pr_info("set_clock_mmss\n");
- return 0;
-}
-
void dn_dummy_reset(void) {
dn_serial_print("The end !\n");
diff --git a/arch/m68k/atari/config.c b/arch/m68k/atari/config.c
index 565c6f0..bd96702 100644
--- a/arch/m68k/atari/config.c
+++ b/arch/m68k/atari/config.c
@@ -81,9 +81,6 @@
extern u32 atari_gettimeoffset(void);
extern int atari_mste_hwclk (int, struct rtc_time *);
extern int atari_tt_hwclk (int, struct rtc_time *);
-extern int atari_mste_set_clock_mmss (unsigned long);
-extern int atari_tt_set_clock_mmss (unsigned long);
-
/* ++roman: This is a more elaborate test for an SCC chip, since the plain
* Medusa board generates DTACK at the SCC's standard addresses, but a SCC
@@ -362,13 +359,11 @@
ATARIHW_SET(TT_CLK);
pr_cont(" TT_CLK");
mach_hwclk = atari_tt_hwclk;
- mach_set_clock_mmss = atari_tt_set_clock_mmss;
}
if (hwreg_present(&mste_rtc.sec_ones)) {
ATARIHW_SET(MSTE_CLK);
pr_cont(" MSTE_CLK");
mach_hwclk = atari_mste_hwclk;
- mach_set_clock_mmss = atari_mste_set_clock_mmss;
}
if (!MACH_IS_MEDUSA && hwreg_present(&dma_wd.fdc_speed) &&
hwreg_write(&dma_wd.fdc_speed, 0)) {
diff --git a/arch/m68k/atari/time.c b/arch/m68k/atari/time.c
index c549b48..9cca642 100644
--- a/arch/m68k/atari/time.c
+++ b/arch/m68k/atari/time.c
@@ -285,69 +285,6 @@
return( 0 );
}
-
-int atari_mste_set_clock_mmss (unsigned long nowtime)
-{
- short real_seconds = nowtime % 60, real_minutes = (nowtime / 60) % 60;
- struct MSTE_RTC val;
- unsigned char rtc_minutes;
-
- mste_read(&val);
- rtc_minutes= val.min_ones + val.min_tens * 10;
- if ((rtc_minutes < real_minutes
- ? real_minutes - rtc_minutes
- : rtc_minutes - real_minutes) < 30)
- {
- val.sec_ones = real_seconds % 10;
- val.sec_tens = real_seconds / 10;
- val.min_ones = real_minutes % 10;
- val.min_tens = real_minutes / 10;
- mste_write(&val);
- }
- else
- return -1;
- return 0;
-}
-
-int atari_tt_set_clock_mmss (unsigned long nowtime)
-{
- int retval = 0;
- short real_seconds = nowtime % 60, real_minutes = (nowtime / 60) % 60;
- unsigned char save_control, save_freq_select, rtc_minutes;
-
- save_control = RTC_READ (RTC_CONTROL); /* tell the clock it's being set */
- RTC_WRITE (RTC_CONTROL, save_control | RTC_SET);
-
- save_freq_select = RTC_READ (RTC_FREQ_SELECT); /* stop and reset prescaler */
- RTC_WRITE (RTC_FREQ_SELECT, save_freq_select | RTC_DIV_RESET2);
-
- rtc_minutes = RTC_READ (RTC_MINUTES);
- if (!(save_control & RTC_DM_BINARY))
- rtc_minutes = bcd2bin(rtc_minutes);
-
- /* Since we're only adjusting minutes and seconds, don't interfere
- with hour overflow. This avoids messing with unknown time zones
- but requires your RTC not to be off by more than 30 minutes. */
- if ((rtc_minutes < real_minutes
- ? real_minutes - rtc_minutes
- : rtc_minutes - real_minutes) < 30)
- {
- if (!(save_control & RTC_DM_BINARY))
- {
- real_seconds = bin2bcd(real_seconds);
- real_minutes = bin2bcd(real_minutes);
- }
- RTC_WRITE (RTC_SECONDS, real_seconds);
- RTC_WRITE (RTC_MINUTES, real_minutes);
- }
- else
- retval = -1;
-
- RTC_WRITE (RTC_FREQ_SELECT, save_freq_select);
- RTC_WRITE (RTC_CONTROL, save_control);
- return retval;
-}
-
/*
* Local variables:
* c-indent-level: 4
diff --git a/arch/m68k/bvme6000/config.c b/arch/m68k/bvme6000/config.c
index 2cfff47..143ee9f 100644
--- a/arch/m68k/bvme6000/config.c
+++ b/arch/m68k/bvme6000/config.c
@@ -41,7 +41,6 @@
extern void bvme6000_sched_init(irq_handler_t handler);
extern u32 bvme6000_gettimeoffset(void);
extern int bvme6000_hwclk (int, struct rtc_time *);
-extern int bvme6000_set_clock_mmss (unsigned long);
extern void bvme6000_reset (void);
void bvme6000_set_vectors (void);
@@ -113,7 +112,6 @@
mach_init_IRQ = bvme6000_init_IRQ;
arch_gettimeoffset = bvme6000_gettimeoffset;
mach_hwclk = bvme6000_hwclk;
- mach_set_clock_mmss = bvme6000_set_clock_mmss;
mach_reset = bvme6000_reset;
mach_get_model = bvme6000_get_model;
@@ -305,46 +303,3 @@
return 0;
}
-
-/*
- * Set the minutes and seconds from seconds value 'nowtime'. Fail if
- * clock is out by > 30 minutes. Logic lifted from atari code.
- * Algorithm is to wait for the 10ms register to change, and then to
- * wait a short while, and then set it.
- */
-
-int bvme6000_set_clock_mmss (unsigned long nowtime)
-{
- int retval = 0;
- short real_seconds = nowtime % 60, real_minutes = (nowtime / 60) % 60;
- unsigned char rtc_minutes, rtc_tenms;
- volatile RtcPtr_t rtc = (RtcPtr_t)BVME_RTC_BASE;
- unsigned char msr = rtc->msr & 0xc0;
- unsigned long flags;
- volatile int i;
-
- rtc->msr = 0; /* Ensure clock accessible */
- rtc_minutes = bcd2bin (rtc->bcd_min);
-
- if ((rtc_minutes < real_minutes
- ? real_minutes - rtc_minutes
- : rtc_minutes - real_minutes) < 30)
- {
- local_irq_save(flags);
- rtc_tenms = rtc->bcd_tenms;
- while (rtc_tenms == rtc->bcd_tenms)
- ;
- for (i = 0; i < 1000; i++)
- ;
- rtc->bcd_min = bin2bcd(real_minutes);
- rtc->bcd_sec = bin2bcd(real_seconds);
- local_irq_restore(flags);
- }
- else
- retval = -1;
-
- rtc->msr = msr;
-
- return retval;
-}
-
diff --git a/arch/m68k/configs/amiga_defconfig b/arch/m68k/configs/amiga_defconfig
index a874e54..1d5483f 100644
--- a/arch/m68k/configs/amiga_defconfig
+++ b/arch/m68k/configs/amiga_defconfig
@@ -52,6 +52,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -98,18 +99,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -122,6 +119,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -200,7 +198,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -231,7 +228,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -260,7 +256,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -301,6 +296,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -356,6 +352,7 @@
CONFIG_GVP11_SCSI=y
CONFIG_SCSI_A4000T=y
CONFIG_SCSI_ZORRO7XX=y
+CONFIG_SCSI_ZORRO_ESP=y
CONFIG_MD=y
CONFIG_MD_LINEAR=m
CONFIG_BLK_DEV_DM=m
@@ -363,6 +360,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -402,8 +400,8 @@
CONFIG_ARIADNE=y
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CIRRUS is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
@@ -412,8 +410,10 @@
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
+CONFIG_XSURF100=y
CONFIG_HYDRA=y
CONFIG_APNE=y
CONFIG_ZORRO8390=y
@@ -426,9 +426,9 @@
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -478,6 +478,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -499,7 +500,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -600,6 +601,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -622,6 +624,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -657,6 +664,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/apollo_defconfig b/arch/m68k/configs/apollo_defconfig
index 8ce39e2..52a0af1 100644
--- a/arch/m68k/configs/apollo_defconfig
+++ b/arch/m68k/configs/apollo_defconfig
@@ -50,6 +50,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -96,18 +97,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -120,6 +117,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -198,7 +196,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -229,7 +226,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -258,7 +254,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -299,6 +294,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -345,6 +341,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -381,14 +378,15 @@
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -400,9 +398,9 @@
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -440,6 +438,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -458,7 +457,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -559,6 +558,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -581,6 +581,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -616,6 +621,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/atari_defconfig b/arch/m68k/configs/atari_defconfig
index 346c4e7..b3103e5 100644
--- a/arch/m68k/configs/atari_defconfig
+++ b/arch/m68k/configs/atari_defconfig
@@ -50,6 +50,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -96,18 +97,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -120,6 +117,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -198,7 +196,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -229,7 +226,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -258,7 +254,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -299,6 +294,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -354,6 +350,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -391,14 +388,15 @@
CONFIG_ATARILANCE=y
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
CONFIG_NE2000=y
@@ -411,9 +409,9 @@
CONFIG_SMC91X=y
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -480,7 +478,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -581,6 +579,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -603,6 +602,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -638,6 +642,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/bvme6000_defconfig b/arch/m68k/configs/bvme6000_defconfig
index fca9c7a..fb7d651 100644
--- a/arch/m68k/configs/bvme6000_defconfig
+++ b/arch/m68k/configs/bvme6000_defconfig
@@ -48,6 +48,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -94,18 +95,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -118,6 +115,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -196,7 +194,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -227,7 +224,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -256,7 +252,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -297,6 +292,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -344,6 +340,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -380,14 +377,15 @@
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
CONFIG_BVME6000_NET=y
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -399,9 +397,9 @@
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -433,6 +431,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -450,7 +449,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -551,6 +550,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -573,6 +573,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -608,6 +613,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/hp300_defconfig b/arch/m68k/configs/hp300_defconfig
index f9eab17..6b37f55 100644
--- a/arch/m68k/configs/hp300_defconfig
+++ b/arch/m68k/configs/hp300_defconfig
@@ -50,6 +50,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -96,18 +97,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -120,6 +117,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -198,7 +196,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -229,7 +226,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -258,7 +254,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -299,6 +294,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -345,6 +341,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -382,14 +379,15 @@
CONFIG_HPLANCE=y
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -401,9 +399,9 @@
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -443,6 +441,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -460,7 +459,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -561,6 +560,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -583,6 +583,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -618,6 +623,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/mac_defconfig b/arch/m68k/configs/mac_defconfig
index b52e597..930cc29 100644
--- a/arch/m68k/configs/mac_defconfig
+++ b/arch/m68k/configs/mac_defconfig
@@ -49,6 +49,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -95,18 +96,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -119,6 +116,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -197,7 +195,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -228,7 +225,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -257,7 +253,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -301,6 +296,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -354,6 +350,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -398,8 +395,8 @@
CONFIG_MACMACE=y
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
CONFIG_MAC89x0=y
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
@@ -407,6 +404,7 @@
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
CONFIG_MACSONIC=y
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -420,9 +418,9 @@
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -465,6 +463,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -482,7 +481,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -583,6 +582,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -605,6 +605,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -640,6 +645,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/multi_defconfig b/arch/m68k/configs/multi_defconfig
index 2a84eee..e7dd253 100644
--- a/arch/m68k/configs/multi_defconfig
+++ b/arch/m68k/configs/multi_defconfig
@@ -59,6 +59,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -105,18 +106,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -129,6 +126,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -207,7 +205,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -238,7 +235,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -267,7 +263,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -311,6 +306,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -373,6 +369,7 @@
CONFIG_GVP11_SCSI=y
CONFIG_SCSI_A4000T=y
CONFIG_SCSI_ZORRO7XX=y
+CONFIG_SCSI_ZORRO_ESP=y
CONFIG_ATARI_SCSI=y
CONFIG_MAC_SCSI=y
CONFIG_SCSI_MAC_ESP=y
@@ -387,6 +384,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -438,8 +436,8 @@
CONFIG_MACMACE=y
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
CONFIG_MAC89x0=y
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
@@ -449,9 +447,11 @@
CONFIG_MVME16x_NET=y
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
CONFIG_MACSONIC=y
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
+CONFIG_XSURF100=y
CONFIG_HYDRA=y
CONFIG_MAC8390=y
CONFIG_NE2000=y
@@ -466,9 +466,9 @@
CONFIG_SMC91X=y
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PLIP=m
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
@@ -533,6 +533,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -562,7 +563,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -663,6 +664,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -685,6 +687,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -720,6 +727,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/mvme147_defconfig b/arch/m68k/configs/mvme147_defconfig
index 476e699..b383327 100644
--- a/arch/m68k/configs/mvme147_defconfig
+++ b/arch/m68k/configs/mvme147_defconfig
@@ -47,6 +47,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -93,18 +94,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -117,6 +114,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -195,7 +193,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -226,7 +223,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -255,7 +251,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -296,6 +291,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -343,6 +339,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -380,14 +377,15 @@
CONFIG_MVME147_NET=y
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -399,9 +397,9 @@
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -433,6 +431,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -450,7 +449,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -551,6 +550,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -573,6 +573,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -608,6 +613,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/mvme16x_defconfig b/arch/m68k/configs/mvme16x_defconfig
index 1477cda..9783d3d 100644
--- a/arch/m68k/configs/mvme16x_defconfig
+++ b/arch/m68k/configs/mvme16x_defconfig
@@ -48,6 +48,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -94,18 +95,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -118,6 +115,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -196,7 +194,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -227,7 +224,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -256,7 +252,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -297,6 +292,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -344,6 +340,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -380,14 +377,15 @@
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
CONFIG_MVME16x_NET=y
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -399,9 +397,9 @@
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -433,6 +431,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -450,7 +449,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -551,6 +550,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -573,6 +573,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -608,6 +613,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/q40_defconfig b/arch/m68k/configs/q40_defconfig
index b3a543d..a35d10e 100644
--- a/arch/m68k/configs/q40_defconfig
+++ b/arch/m68k/configs/q40_defconfig
@@ -48,6 +48,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -94,18 +95,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -118,6 +115,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -196,7 +194,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -227,7 +224,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -256,7 +252,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -297,6 +292,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -350,6 +346,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -388,8 +385,8 @@
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CIRRUS is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
@@ -398,6 +395,7 @@
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
CONFIG_NE2000=y
@@ -410,9 +408,9 @@
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PLIP=m
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
@@ -455,6 +453,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -473,7 +472,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -574,6 +573,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -596,6 +596,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -631,6 +636,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/sun3_defconfig b/arch/m68k/configs/sun3_defconfig
index d543ed5..573bf92 100644
--- a/arch/m68k/configs/sun3_defconfig
+++ b/arch/m68k/configs/sun3_defconfig
@@ -45,6 +45,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -91,18 +92,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -115,6 +112,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -193,7 +191,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -224,7 +221,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -253,7 +249,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -294,6 +289,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -341,6 +337,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -385,6 +382,7 @@
CONFIG_SUN3_82586=y
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -397,9 +395,9 @@
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -435,6 +433,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -452,7 +451,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -553,6 +552,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -574,6 +574,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -609,6 +614,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/configs/sun3x_defconfig b/arch/m68k/configs/sun3x_defconfig
index a67e542..efb27a7 100644
--- a/arch/m68k/configs/sun3x_defconfig
+++ b/arch/m68k/configs/sun3x_defconfig
@@ -45,6 +45,7 @@
CONFIG_TLS=m
CONFIG_XFRM_MIGRATE=y
CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
@@ -91,18 +92,14 @@
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_TABLES=m
+CONFIG_NF_TABLES_SET=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
-CONFIG_NFT_EXTHDR=m
-CONFIG_NFT_META=m
-CONFIG_NFT_RT=m
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
-CONFIG_NFT_SET_RBTREE=m
-CONFIG_NFT_SET_HASH=m
-CONFIG_NFT_SET_BITMAP=m
CONFIG_NFT_COUNTER=m
+CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
@@ -115,6 +112,7 @@
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
+CONFIG_NFT_SOCKET=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
@@ -193,7 +191,6 @@
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_NF_CONNTRACK_IPV4=m
-CONFIG_NF_SOCKET_IPV4=m
CONFIG_NFT_CHAIN_ROUTE_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
@@ -224,7 +221,6 @@
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NF_CONNTRACK_IPV6=m
-CONFIG_NF_SOCKET_IPV6=m
CONFIG_NFT_CHAIN_ROUTE_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_MASQ_IPV6=m
@@ -253,7 +249,6 @@
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_TABLES_BRIDGE=y
-CONFIG_NFT_BRIDGE_META=m
CONFIG_NFT_BRIDGE_REJECT=m
CONFIG_NF_LOG_BRIDGE=m
CONFIG_BRIDGE_NF_EBTABLES=m
@@ -294,6 +289,7 @@
CONFIG_6LOWPAN_GHC_EXT_HDR_ROUTE=m
CONFIG_DNS_RESOLVER=y
CONFIG_BATMAN_ADV=m
+# CONFIG_BATMAN_ADV_BATMAN_V is not set
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_NC=y
CONFIG_BATMAN_ADV_MCAST=y
@@ -341,6 +337,7 @@
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
+CONFIG_DM_WRITECACHE=m
CONFIG_DM_ERA=m
CONFIG_DM_MIRROR=m
CONFIG_DM_RAID=m
@@ -378,14 +375,15 @@
CONFIG_SUN3LANCE=y
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
-# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_CADENCE is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MICREL is not set
+# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
@@ -397,9 +395,9 @@
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
+# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
-# CONFIG_NET_VENDOR_SYNOPSYS is not set
CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
@@ -435,6 +433,7 @@
CONFIG_UHID=m
# CONFIG_HID_GENERIC is not set
# CONFIG_HID_ITE is not set
+# CONFIG_HID_REDRAGON is not set
# CONFIG_USB_SUPPORT is not set
CONFIG_RTC_CLASS=y
# CONFIG_RTC_NVMEM is not set
@@ -452,7 +451,7 @@
CONFIG_FANOTIFY=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
# CONFIG_PRINT_QUOTA_WARNING is not set
-CONFIG_AUTOFS4_FS=m
+CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_OVERLAY_FS=m
@@ -553,6 +552,7 @@
CONFIG_TEST_PRINTF=m
CONFIG_TEST_BITMAP=m
CONFIG_TEST_UUID=m
+CONFIG_TEST_OVERFLOW=m
CONFIG_TEST_RHASHTABLE=m
CONFIG_TEST_HASH=m
CONFIG_TEST_USER_COPY=m
@@ -575,6 +575,11 @@
CONFIG_CRYPTO_MCRYPTD=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_AEGIS128=m
+CONFIG_CRYPTO_AEGIS128L=m
+CONFIG_CRYPTO_AEGIS256=m
+CONFIG_CRYPTO_MORUS640=m
+CONFIG_CRYPTO_MORUS1280=m
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
@@ -610,6 +615,7 @@
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
+CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 4d8d68c..a4b8d33 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -1,6 +1,7 @@
generic-y += barrier.h
generic-y += compat.h
generic-y += device.h
+generic-y += dma-mapping.h
generic-y += emergency-restart.h
generic-y += exec.h
generic-y += extable.h
diff --git a/arch/m68k/include/asm/atomic.h b/arch/m68k/include/asm/atomic.h
index e993e28..47228b0 100644
--- a/arch/m68k/include/asm/atomic.h
+++ b/arch/m68k/include/asm/atomic.h
@@ -126,11 +126,13 @@
{
__asm__ __volatile__("addql #1,%0" : "+m" (*v));
}
+#define atomic_inc atomic_inc
static inline void atomic_dec(atomic_t *v)
{
__asm__ __volatile__("subql #1,%0" : "+m" (*v));
}
+#define atomic_dec atomic_dec
static inline int atomic_dec_and_test(atomic_t *v)
{
@@ -138,6 +140,7 @@
__asm__ __volatile__("subql #1,%1; seq %0" : "=d" (c), "+m" (*v));
return c != 0;
}
+#define atomic_dec_and_test atomic_dec_and_test
static inline int atomic_dec_and_test_lt(atomic_t *v)
{
@@ -155,6 +158,7 @@
__asm__ __volatile__("addql #1,%1; seq %0" : "=d" (c), "+m" (*v));
return c != 0;
}
+#define atomic_inc_and_test atomic_inc_and_test
#ifdef CONFIG_RMW_INSNS
@@ -190,9 +194,6 @@
#endif /* !CONFIG_RMW_INSNS */
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-
static inline int atomic_sub_and_test(int i, atomic_t *v)
{
char c;
@@ -201,6 +202,7 @@
: ASM_DI (i));
return c != 0;
}
+#define atomic_sub_and_test atomic_sub_and_test
static inline int atomic_add_negative(int i, atomic_t *v)
{
@@ -210,20 +212,6 @@
: ASM_DI (i));
return c != 0;
}
-
-static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
+#define atomic_add_negative atomic_add_negative
#endif /* __ARCH_M68K_ATOMIC __ */
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 93b47b1..d979f38 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -454,7 +454,7 @@
*/
#if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
!defined(CONFIG_M68000) && !defined(CONFIG_MCPU32)
-static inline int __ffs(int x)
+static inline unsigned long __ffs(unsigned long x)
{
__asm__ __volatile__ ("bitrev %0; ff1 %0"
: "=d" (x)
@@ -493,7 +493,11 @@
: "dm" (x & -x));
return 32 - cnt;
}
-#define __ffs(x) (ffs(x) - 1)
+
+static inline unsigned long __ffs(unsigned long x)
+{
+ return ffs(x) - 1;
+}
/*
* fls: find last bit set.
@@ -515,12 +519,16 @@
#endif
+/* Simple test-and-set bit locks */
+#define test_and_set_bit_lock test_and_set_bit
+#define clear_bit_unlock clear_bit
+#define __clear_bit_unlock clear_bit_unlock
+
#include <asm-generic/bitops/ext2-atomic.h>
#include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
-#include <asm-generic/bitops/lock.h>
#endif /* __KERNEL__ */
#endif /* _M68K_BITOPS_H */
diff --git a/arch/m68k/include/asm/dma-mapping.h b/arch/m68k/include/asm/dma-mapping.h
deleted file mode 100644
index e3722ed..0000000
--- a/arch/m68k/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _M68K_DMA_MAPPING_H
-#define _M68K_DMA_MAPPING_H
-
-extern const struct dma_map_ops m68k_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
- return &m68k_dma_ops;
-}
-
-#endif /* _M68K_DMA_MAPPING_H */
diff --git a/arch/m68k/include/asm/io.h b/arch/m68k/include/asm/io.h
index ca2849a..aabe642 100644
--- a/arch/m68k/include/asm/io.h
+++ b/arch/m68k/include/asm/io.h
@@ -1,6 +1,13 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _M68K_IO_H
+#define _M68K_IO_H
+
#if defined(__uClinux__) || defined(CONFIG_COLDFIRE)
#include <asm/io_no.h>
#else
#include <asm/io_mm.h>
#endif
+
+#include <asm-generic/io.h>
+
+#endif /* _M68K_IO_H */
diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index fe485f4..782b78f 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -16,13 +16,11 @@
* isa_readX(),isa_writeX() are for ISA memory
*/
-#ifndef _IO_H
-#define _IO_H
+#ifndef _M68K_IO_MM_H
+#define _M68K_IO_MM_H
#ifdef __KERNEL__
-#define ARCH_HAS_IOREMAP_WT
-
#include <linux/compiler.h>
#include <asm/raw_io.h>
#include <asm/virtconvert.h>
@@ -369,40 +367,6 @@
#define writew(val, addr) out_le16((addr), (val))
#endif /* CONFIG_ATARI_ROM_ISA */
-#if !defined(CONFIG_ISA) && !defined(CONFIG_ATARI_ROM_ISA)
-/*
- * We need to define dummy functions for GENERIC_IOMAP support.
- */
-#define inb(port) 0xff
-#define inb_p(port) 0xff
-#define outb(val,port) ((void)0)
-#define outb_p(val,port) ((void)0)
-#define inw(port) 0xffff
-#define inw_p(port) 0xffff
-#define outw(val,port) ((void)0)
-#define outw_p(val,port) ((void)0)
-#define inl(port) 0xffffffffUL
-#define inl_p(port) 0xffffffffUL
-#define outl(val,port) ((void)0)
-#define outl_p(val,port) ((void)0)
-
-#define insb(port,buf,nr) ((void)0)
-#define outsb(port,buf,nr) ((void)0)
-#define insw(port,buf,nr) ((void)0)
-#define outsw(port,buf,nr) ((void)0)
-#define insl(port,buf,nr) ((void)0)
-#define outsl(port,buf,nr) ((void)0)
-
-/*
- * These should be valid on any ioremap()ed region
- */
-#define readb(addr) in_8(addr)
-#define writeb(val,addr) out_8((addr),(val))
-#define readw(addr) in_le16(addr)
-#define writew(val,addr) out_le16((addr),(val))
-
-#endif /* !CONFIG_ISA && !CONFIG_ATARI_ROM_ISA */
-
#define readl(addr) in_le32(addr)
#define writel(val,addr) out_le32((addr),(val))
@@ -444,4 +408,4 @@
#define writew_relaxed(b, addr) writew(b, addr)
#define writel_relaxed(b, addr) writel(b, addr)
-#endif /* _IO_H */
+#endif /* _M68K_IO_MM_H */
diff --git a/arch/m68k/include/asm/io_no.h b/arch/m68k/include/asm/io_no.h
index 83a0a6d..0498192 100644
--- a/arch/m68k/include/asm/io_no.h
+++ b/arch/m68k/include/asm/io_no.h
@@ -131,19 +131,7 @@
#define PCI_SPACE_LIMIT PCI_IO_MASK
#endif /* CONFIG_PCI */
-/*
- * These are defined in kmap.h as static inline functions. To maintain
- * previous behavior we put these define guards here so io_mm.h doesn't
- * see them.
- */
-#ifdef CONFIG_MMU
-#define memset_io memset_io
-#define memcpy_fromio memcpy_fromio
-#define memcpy_toio memcpy_toio
-#endif
-
#include <asm/kmap.h>
#include <asm/virtconvert.h>
-#include <asm-generic/io.h>
#endif /* _M68KNOMMU_IO_H */
diff --git a/arch/m68k/include/asm/kmap.h b/arch/m68k/include/asm/kmap.h
index 84b8333..aac7f04 100644
--- a/arch/m68k/include/asm/kmap.h
+++ b/arch/m68k/include/asm/kmap.h
@@ -4,6 +4,8 @@
#ifdef CONFIG_MMU
+#define ARCH_HAS_IOREMAP_WT
+
/* Values for nocacheflag and cmode */
#define IOMAP_FULL_CACHING 0
#define IOMAP_NOCACHE_SER 1
@@ -16,6 +18,7 @@
*/
extern void __iomem *__ioremap(unsigned long physaddr, unsigned long size,
int cacheflag);
+#define iounmap iounmap
extern void iounmap(void __iomem *addr);
extern void __iounmap(void *addr, unsigned long size);
@@ -33,31 +36,35 @@
}
#define ioremap_uc ioremap_nocache
+#define ioremap_wt ioremap_wt
static inline void __iomem *ioremap_wt(unsigned long physaddr,
unsigned long size)
{
return __ioremap(physaddr, size, IOMAP_WRITETHROUGH);
}
-#define ioremap_fillcache ioremap_fullcache
+#define ioremap_fullcache ioremap_fullcache
static inline void __iomem *ioremap_fullcache(unsigned long physaddr,
unsigned long size)
{
return __ioremap(physaddr, size, IOMAP_FULL_CACHING);
}
+#define memset_io memset_io
static inline void memset_io(volatile void __iomem *addr, unsigned char val,
int count)
{
__builtin_memset((void __force *) addr, val, count);
}
+#define memcpy_fromio memcpy_fromio
static inline void memcpy_fromio(void *dst, const volatile void __iomem *src,
int count)
{
__builtin_memcpy(dst, (void __force *) src, count);
}
+#define memcpy_toio memcpy_toio
static inline void memcpy_toio(volatile void __iomem *dst, const void *src,
int count)
{
diff --git a/arch/m68k/include/asm/machdep.h b/arch/m68k/include/asm/machdep.h
index 1605da4..49bd3266 100644
--- a/arch/m68k/include/asm/machdep.h
+++ b/arch/m68k/include/asm/machdep.h
@@ -22,7 +22,6 @@
extern unsigned int (*mach_get_ss)(void);
extern int (*mach_get_rtc_pll)(struct rtc_pll_info *);
extern int (*mach_set_rtc_pll)(struct rtc_pll_info *);
-extern int (*mach_set_clock_mmss)(unsigned long);
extern void (*mach_reset)( void );
extern void (*mach_halt)( void );
extern void (*mach_power_off)( void );
diff --git a/arch/m68k/include/asm/macintosh.h b/arch/m68k/include/asm/macintosh.h
index 9b840c0..08cee11 100644
--- a/arch/m68k/include/asm/macintosh.h
+++ b/arch/m68k/include/asm/macintosh.h
@@ -57,7 +57,6 @@
#define MAC_SCSI_IIFX 5
#define MAC_SCSI_DUO 6
#define MAC_SCSI_LC 7
-#define MAC_SCSI_LATE 8
#define MAC_IDE_NONE 0
#define MAC_IDE_QUADRA 1
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index e644c4d..6bbe520 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -18,7 +18,7 @@
#define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
#define __pa(vaddr) ((unsigned long)(vaddr))
-#define __va(paddr) ((void *)(paddr))
+#define __va(paddr) ((void *)((unsigned long)(paddr)))
#define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
#define pfn_to_virt(pfn) __va((pfn) << PAGE_SHIFT)
diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c
index 463572c..e99993c 100644
--- a/arch/m68k/kernel/dma.c
+++ b/arch/m68k/kernel/dma.c
@@ -6,7 +6,7 @@
#undef DEBUG
-#include <linux/dma-mapping.h>
+#include <linux/dma-noncoherent.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/platform_device.h>
@@ -19,7 +19,7 @@
#if defined(CONFIG_MMU) && !defined(CONFIG_COLDFIRE)
-static void *m68k_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
gfp_t flag, unsigned long attrs)
{
struct page *page, **map;
@@ -62,7 +62,7 @@
return addr;
}
-static void m68k_dma_free(struct device *dev, size_t size, void *addr,
+void arch_dma_free(struct device *dev, size_t size, void *addr,
dma_addr_t handle, unsigned long attrs)
{
pr_debug("dma_free_coherent: %p, %x\n", addr, handle);
@@ -73,8 +73,8 @@
#include <asm/cacheflush.h>
-static void *m68k_dma_alloc(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp, unsigned long attrs)
{
void *ret;
@@ -89,7 +89,7 @@
return ret;
}
-static void m68k_dma_free(struct device *dev, size_t size, void *vaddr,
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
free_pages((unsigned long)vaddr, get_order(size));
@@ -97,8 +97,8 @@
#endif /* CONFIG_MMU && !CONFIG_COLDFIRE */
-static void m68k_dma_sync_single_for_device(struct device *dev,
- dma_addr_t handle, size_t size, enum dma_data_direction dir)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t handle,
+ size_t size, enum dma_data_direction dir)
{
switch (dir) {
case DMA_BIDIRECTIONAL:
@@ -115,58 +115,6 @@
}
}
-static void m68k_dma_sync_sg_for_device(struct device *dev,
- struct scatterlist *sglist, int nents, enum dma_data_direction dir)
-{
- int i;
- struct scatterlist *sg;
-
- for_each_sg(sglist, sg, nents, i) {
- dma_sync_single_for_device(dev, sg->dma_address, sg->length,
- dir);
- }
-}
-
-static dma_addr_t m68k_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
-{
- dma_addr_t handle = page_to_phys(page) + offset;
-
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- dma_sync_single_for_device(dev, handle, size, dir);
-
- return handle;
-}
-
-static int m68k_dma_map_sg(struct device *dev, struct scatterlist *sglist,
- int nents, enum dma_data_direction dir, unsigned long attrs)
-{
- int i;
- struct scatterlist *sg;
-
- for_each_sg(sglist, sg, nents, i) {
- sg->dma_address = sg_phys(sg);
-
- if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
- continue;
-
- dma_sync_single_for_device(dev, sg->dma_address, sg->length,
- dir);
- }
- return nents;
-}
-
-const struct dma_map_ops m68k_dma_ops = {
- .alloc = m68k_dma_alloc,
- .free = m68k_dma_free,
- .map_page = m68k_dma_map_page,
- .map_sg = m68k_dma_map_sg,
- .sync_single_for_device = m68k_dma_sync_single_for_device,
- .sync_sg_for_device = m68k_dma_sync_sg_for_device,
-};
-EXPORT_SYMBOL(m68k_dma_ops);
-
void arch_setup_pdev_archdata(struct platform_device *pdev)
{
if (pdev->dev.coherent_dma_mask == DMA_MASK_NONE &&
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c
index f35e3eb..5d3596c 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -21,6 +21,7 @@
#include <linux/string.h>
#include <linux/init.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/module.h>
@@ -88,7 +89,6 @@
/* machine dependent timer functions */
int (*mach_hwclk) (int, struct rtc_time*);
EXPORT_SYMBOL(mach_hwclk);
-int (*mach_set_clock_mmss) (unsigned long);
unsigned int (*mach_get_ss)(void);
int (*mach_get_rtc_pll)(struct rtc_pll_info *);
int (*mach_set_rtc_pll)(struct rtc_pll_info *);
@@ -165,6 +165,8 @@
be32_to_cpu(m->addr);
m68k_memory[m68k_num_memory].size =
be32_to_cpu(m->size);
+ memblock_add(m68k_memory[m68k_num_memory].addr,
+ m68k_memory[m68k_num_memory].size);
m68k_num_memory++;
} else
pr_warn("%s: too many memory chunks\n",
@@ -224,10 +226,6 @@
void __init setup_arch(char **cmdline_p)
{
-#ifndef CONFIG_SUN3
- int i;
-#endif
-
/* The bootinfo is located right after the kernel */
if (!CPU_IS_COLDFIRE)
m68k_parse_bootinfo((const struct bi_record *)_end);
@@ -356,14 +354,9 @@
#endif
#ifndef CONFIG_SUN3
- for (i = 1; i < m68k_num_memory; i++)
- free_bootmem_node(NODE_DATA(i), m68k_memory[i].addr,
- m68k_memory[i].size);
#ifdef CONFIG_BLK_DEV_INITRD
if (m68k_ramdisk.size) {
- reserve_bootmem_node(__virt_to_node(phys_to_virt(m68k_ramdisk.addr)),
- m68k_ramdisk.addr, m68k_ramdisk.size,
- BOOTMEM_DEFAULT);
+ memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
initrd_start = (unsigned long)phys_to_virt(m68k_ramdisk.addr);
initrd_end = initrd_start + m68k_ramdisk.size;
pr_info("initrd: %08lx - %08lx\n", initrd_start, initrd_end);
diff --git a/arch/m68k/kernel/setup_no.c b/arch/m68k/kernel/setup_no.c
index a98af10..cfd5475 100644
--- a/arch/m68k/kernel/setup_no.c
+++ b/arch/m68k/kernel/setup_no.c
@@ -28,6 +28,7 @@
#include <linux/errno.h>
#include <linux/string.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/seq_file.h>
#include <linux/init.h>
#include <linux/initrd.h>
@@ -51,7 +52,6 @@
/* machine dependent timer functions */
void (*mach_sched_init)(irq_handler_t handler) __initdata = NULL;
-int (*mach_set_clock_mmss)(unsigned long);
int (*mach_hwclk) (int, struct rtc_time*);
/* machine dependent reboot functions */
@@ -86,8 +86,6 @@
void __init setup_arch(char **cmdline_p)
{
- int bootmap_size;
-
memory_start = PAGE_ALIGN(_ramstart);
memory_end = _ramend;
@@ -142,6 +140,8 @@
pr_debug("MEMORY -> ROMFS=0x%p-0x%06lx MEM=0x%06lx-0x%06lx\n ",
__bss_stop, memory_start, memory_start, memory_end);
+ memblock_add(memory_start, memory_end - memory_start);
+
/* Keep a copy of command line */
*cmdline_p = &command_line[0];
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
@@ -158,23 +158,10 @@
min_low_pfn = PFN_DOWN(memory_start);
max_pfn = max_low_pfn = PFN_DOWN(memory_end);
- bootmap_size = init_bootmem_node(
- NODE_DATA(0),
- min_low_pfn, /* map goes here */
- PFN_DOWN(PAGE_OFFSET),
- max_pfn);
- /*
- * Free the usable memory, we have to make sure we do not free
- * the bootmem bitmap so we then reserve it after freeing it :-)
- */
- free_bootmem(memory_start, memory_end - memory_start);
- reserve_bootmem(memory_start, bootmap_size, BOOTMEM_DEFAULT);
-
#if defined(CONFIG_UBOOT) && defined(CONFIG_BLK_DEV_INITRD)
if ((initrd_start > 0) && (initrd_start < initrd_end) &&
(initrd_end < memory_end))
- reserve_bootmem(initrd_start, initrd_end - initrd_start,
- BOOTMEM_DEFAULT);
+ memblock_reserve(initrd_start, initrd_end - initrd_start);
#endif /* if defined(CONFIG_BLK_DEV_INITRD) */
/*
diff --git a/arch/m68k/mac/config.c b/arch/m68k/mac/config.c
index e522307..b02d725 100644
--- a/arch/m68k/mac/config.c
+++ b/arch/m68k/mac/config.c
@@ -57,7 +57,6 @@
/* Mac specific timer functions */
extern u32 mac_gettimeoffset(void);
extern int mac_hwclk(int, struct rtc_time *);
-extern int mac_set_clock_mmss(unsigned long);
extern void iop_preinit(void);
extern void iop_init(void);
extern void via_init(void);
@@ -158,7 +157,6 @@
mach_get_model = mac_get_model;
arch_gettimeoffset = mac_gettimeoffset;
mach_hwclk = mac_hwclk;
- mach_set_clock_mmss = mac_set_clock_mmss;
mach_reset = mac_reset;
mach_halt = mac_poweroff;
mach_power_off = mac_poweroff;
@@ -709,7 +707,7 @@
.name = "PowerBook 520",
.adb_type = MAC_ADB_PB2,
.via_type = MAC_VIA_QUADRA,
- .scsi_type = MAC_SCSI_LATE,
+ .scsi_type = MAC_SCSI_OLD,
.scc_type = MAC_SCC_QUADRA,
.ether_type = MAC_ETHER_SONIC,
.floppy_type = MAC_FLOPPY_SWIM_ADDR2,
@@ -943,18 +941,6 @@
},
};
-static const struct resource mac_scsi_late_rsrc[] __initconst = {
- {
- .flags = IORESOURCE_IRQ,
- .start = IRQ_MAC_SCSI,
- .end = IRQ_MAC_SCSI,
- }, {
- .flags = IORESOURCE_MEM,
- .start = 0x50010000,
- .end = 0x50011FFF,
- },
-};
-
static const struct resource mac_scsi_ccl_rsrc[] __initconst = {
{
.flags = IORESOURCE_IRQ,
@@ -1064,11 +1050,6 @@
platform_device_register_simple("mac_scsi", 0,
mac_scsi_old_rsrc, ARRAY_SIZE(mac_scsi_old_rsrc));
break;
- case MAC_SCSI_LATE:
- /* XXX PDMA support for PowerBook 500 series needs testing */
- platform_device_register_simple("mac_scsi", 0,
- mac_scsi_late_rsrc, ARRAY_SIZE(mac_scsi_late_rsrc));
- break;
case MAC_SCSI_LC:
/* Addresses from Mac LC data in Designing Cards & Drivers 3ed.
* Also from the Developer Notes for Classic II, LC III,
diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
index c680543..19e9d8e 100644
--- a/arch/m68k/mac/misc.c
+++ b/arch/m68k/mac/misc.c
@@ -26,33 +26,38 @@
#include <asm/machdep.h>
-/* Offset between Unix time (1970-based) and Mac time (1904-based) */
+/*
+ * Offset between Unix time (1970-based) and Mac time (1904-based). Cuda and PMU
+ * times wrap in 2040. If we need to handle later times, the read_time functions
+ * need to be changed to interpret wrapped times as post-2040.
+ */
#define RTC_OFFSET 2082844800
static void (*rom_reset)(void);
#ifdef CONFIG_ADB_CUDA
-static long cuda_read_time(void)
+static time64_t cuda_read_time(void)
{
struct adb_request req;
- long time;
+ time64_t time;
if (cuda_request(&req, NULL, 2, CUDA_PACKET, CUDA_GET_TIME) < 0)
return 0;
while (!req.complete)
cuda_poll();
- time = (req.reply[3] << 24) | (req.reply[4] << 16) |
- (req.reply[5] << 8) | req.reply[6];
+ time = (u32)((req.reply[3] << 24) | (req.reply[4] << 16) |
+ (req.reply[5] << 8) | req.reply[6]);
+
return time - RTC_OFFSET;
}
-static void cuda_write_time(long data)
+static void cuda_write_time(time64_t time)
{
struct adb_request req;
+ u32 data = lower_32_bits(time + RTC_OFFSET);
- data += RTC_OFFSET;
if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME,
(data >> 24) & 0xFF, (data >> 16) & 0xFF,
(data >> 8) & 0xFF, data & 0xFF) < 0)
@@ -86,26 +91,27 @@
#endif /* CONFIG_ADB_CUDA */
#ifdef CONFIG_ADB_PMU68K
-static long pmu_read_time(void)
+static time64_t pmu_read_time(void)
{
struct adb_request req;
- long time;
+ time64_t time;
if (pmu_request(&req, NULL, 1, PMU_READ_RTC) < 0)
return 0;
while (!req.complete)
pmu_poll();
- time = (req.reply[1] << 24) | (req.reply[2] << 16) |
- (req.reply[3] << 8) | req.reply[4];
+ time = (u32)((req.reply[1] << 24) | (req.reply[2] << 16) |
+ (req.reply[3] << 8) | req.reply[4]);
+
return time - RTC_OFFSET;
}
-static void pmu_write_time(long data)
+static void pmu_write_time(time64_t time)
{
struct adb_request req;
+ u32 data = lower_32_bits(time + RTC_OFFSET);
- data += RTC_OFFSET;
if (pmu_request(&req, NULL, 5, PMU_SET_RTC,
(data >> 24) & 0xFF, (data >> 16) & 0xFF,
(data >> 8) & 0xFF, data & 0xFF) < 0)
@@ -245,11 +251,11 @@
* is basically any machine with Mac II-style ADB.
*/
-static long via_read_time(void)
+static time64_t via_read_time(void)
{
union {
__u8 cdata[4];
- long idata;
+ __u32 idata;
} result, last_result;
int count = 1;
@@ -270,7 +276,7 @@
via_pram_command(0x8D, &result.cdata[0]);
if (result.idata == last_result.idata)
- return result.idata - RTC_OFFSET;
+ return (time64_t)result.idata - RTC_OFFSET;
if (++count > 10)
break;
@@ -278,8 +284,8 @@
last_result.idata = result.idata;
}
- pr_err("via_read_time: failed to read a stable value; got 0x%08lx then 0x%08lx\n",
- last_result.idata, result.idata);
+ pr_err("%s: failed to read a stable value; got 0x%08x then 0x%08x\n",
+ __func__, last_result.idata, result.idata);
return 0;
}
@@ -291,11 +297,11 @@
* is basically any machine with Mac II-style ADB.
*/
-static void via_write_time(long time)
+static void via_write_time(time64_t time)
{
union {
__u8 cdata[4];
- long idata;
+ __u32 idata;
} data;
__u8 temp;
@@ -304,7 +310,7 @@
temp = 0x55;
via_pram_command(0x35, &temp);
- data.idata = time + RTC_OFFSET;
+ data.idata = lower_32_bits(time + RTC_OFFSET);
via_pram_command(0x01, &data.cdata[3]);
via_pram_command(0x05, &data.cdata[2]);
via_pram_command(0x09, &data.cdata[1]);
@@ -585,12 +591,15 @@
* This function translates seconds since 1970 into a proper date.
*
* Algorithm cribbed from glibc2.1, __offtime().
+ *
+ * This is roughly same as rtc_time64_to_tm(), which we should probably
+ * use here, but it's only available when CONFIG_RTC_LIB is enabled.
*/
#define SECS_PER_MINUTE (60)
#define SECS_PER_HOUR (SECS_PER_MINUTE * 60)
#define SECS_PER_DAY (SECS_PER_HOUR * 24)
-static void unmktime(unsigned long time, long offset,
+static void unmktime(time64_t time, long offset,
int *yearp, int *monp, int *dayp,
int *hourp, int *minp, int *secp)
{
@@ -602,11 +611,10 @@
/* Leap years. */
{ 0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366 }
};
- long int days, rem, y, wday, yday;
+ int days, rem, y, wday, yday;
const unsigned short int *ip;
- days = time / SECS_PER_DAY;
- rem = time % SECS_PER_DAY;
+ days = div_u64_rem(time, SECS_PER_DAY, &rem);
rem += offset;
while (rem < 0) {
rem += SECS_PER_DAY;
@@ -657,7 +665,7 @@
int mac_hwclk(int op, struct rtc_time *t)
{
- unsigned long now;
+ time64_t now;
if (!op) { /* read */
switch (macintosh_config->adb_type) {
@@ -693,8 +701,8 @@
__func__, t->tm_year + 1900, t->tm_mon + 1, t->tm_mday,
t->tm_hour, t->tm_min, t->tm_sec);
- now = mktime(t->tm_year + 1900, t->tm_mon + 1, t->tm_mday,
- t->tm_hour, t->tm_min, t->tm_sec);
+ now = mktime64(t->tm_year + 1900, t->tm_mon + 1, t->tm_mday,
+ t->tm_hour, t->tm_min, t->tm_sec);
switch (macintosh_config->adb_type) {
case MAC_ADB_IOP:
@@ -719,19 +727,3 @@
}
return 0;
}
-
-/*
- * Set minutes/seconds in the hardware clock
- */
-
-int mac_set_clock_mmss (unsigned long nowtime)
-{
- struct rtc_time now;
-
- mac_hwclk(0, &now);
- now.tm_sec = nowtime % 60;
- now.tm_min = (nowtime / 60) % 60;
- mac_hwclk(1, &now);
-
- return 0;
-}
diff --git a/arch/m68k/mm/init.c b/arch/m68k/mm/init.c
index 8827b7f..38e2b27 100644
--- a/arch/m68k/mm/init.c
+++ b/arch/m68k/mm/init.c
@@ -71,7 +71,6 @@
pg_data_table[i] = pg_data_map + node;
}
#endif
- pg_data_map[node].bdata = bootmem_node_data + node;
node_set_online(node);
}
diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index 2925d79..70dde04 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -14,6 +14,7 @@
#include <linux/init.h>
#include <linux/string.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <asm/setup.h>
#include <asm/page.h>
@@ -153,31 +154,31 @@
void __init cf_bootmem_alloc(void)
{
- unsigned long start_pfn;
unsigned long memstart;
/* _rambase and _ramend will be naturally page aligned */
m68k_memory[0].addr = _rambase;
m68k_memory[0].size = _ramend - _rambase;
+ memblock_add(m68k_memory[0].addr, m68k_memory[0].size);
+
/* compute total pages in system */
num_pages = PFN_DOWN(_ramend - _rambase);
/* page numbers */
memstart = PAGE_ALIGN(_ramstart);
min_low_pfn = PFN_DOWN(_rambase);
- start_pfn = PFN_DOWN(memstart);
max_pfn = max_low_pfn = PFN_DOWN(_ramend);
high_memory = (void *)_ramend;
+ /* Reserve kernel text/data/bss */
+ memblock_reserve(memstart, memstart - _rambase);
+
m68k_virt_to_node_shift = fls(_ramend - 1) - 6;
module_fixup(NULL, __start_fixup, __stop_fixup);
- /* setup bootmem data */
+ /* setup node data */
m68k_setup_node(0);
- memstart += init_bootmem_node(NODE_DATA(0), start_pfn,
- min_low_pfn, max_low_pfn);
- free_bootmem_node(NODE_DATA(0), memstart, _ramend - memstart);
}
/*
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index e490ecc..4e17ecb 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -19,6 +19,7 @@
#include <linux/types.h>
#include <linux/init.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/gfp.h>
#include <asm/setup.h>
@@ -208,7 +209,7 @@
{
unsigned long zones_size[MAX_NR_ZONES] = { 0, };
unsigned long min_addr, max_addr;
- unsigned long addr, size, end;
+ unsigned long addr;
int i;
#ifdef DEBUG
@@ -253,34 +254,20 @@
min_low_pfn = availmem >> PAGE_SHIFT;
max_pfn = max_low_pfn = max_addr >> PAGE_SHIFT;
- for (i = 0; i < m68k_num_memory; i++) {
- addr = m68k_memory[i].addr;
- end = addr + m68k_memory[i].size;
- m68k_setup_node(i);
- availmem = PAGE_ALIGN(availmem);
- availmem += init_bootmem_node(NODE_DATA(i),
- availmem >> PAGE_SHIFT,
- addr >> PAGE_SHIFT,
- end >> PAGE_SHIFT);
- }
+ /* Reserve kernel text/data/bss and the memory allocated in head.S */
+ memblock_reserve(m68k_memory[0].addr, availmem - m68k_memory[0].addr);
/*
* Map the physical memory available into the kernel virtual
- * address space. First initialize the bootmem allocator with
- * the memory we already mapped, so map_node() has something
- * to allocate.
+ * address space. Make sure memblock will not try to allocate
+ * pages beyond the memory we already mapped in head.S
*/
- addr = m68k_memory[0].addr;
- size = m68k_memory[0].size;
- free_bootmem_node(NODE_DATA(0), availmem,
- min(m68k_init_mapped_size, size) - (availmem - addr));
- map_node(0);
- if (size > m68k_init_mapped_size)
- free_bootmem_node(NODE_DATA(0), addr + m68k_init_mapped_size,
- size - m68k_init_mapped_size);
+ memblock_set_bottom_up(true);
- for (i = 1; i < m68k_num_memory; i++)
+ for (i = 0; i < m68k_num_memory; i++) {
+ m68k_setup_node(i);
map_node(i);
+ }
flush_tlb_all();
diff --git a/arch/m68k/mvme147/config.c b/arch/m68k/mvme147/config.c
index f8a710f..adea549 100644
--- a/arch/m68k/mvme147/config.c
+++ b/arch/m68k/mvme147/config.c
@@ -40,7 +40,6 @@
extern void mvme147_sched_init(irq_handler_t handler);
extern u32 mvme147_gettimeoffset(void);
extern int mvme147_hwclk (int, struct rtc_time *);
-extern int mvme147_set_clock_mmss (unsigned long);
extern void mvme147_reset (void);
@@ -92,7 +91,6 @@
mach_init_IRQ = mvme147_init_IRQ;
arch_gettimeoffset = mvme147_gettimeoffset;
mach_hwclk = mvme147_hwclk;
- mach_set_clock_mmss = mvme147_set_clock_mmss;
mach_reset = mvme147_reset;
mach_get_model = mvme147_get_model;
@@ -164,8 +162,3 @@
}
return 0;
}
-
-int mvme147_set_clock_mmss (unsigned long nowtime)
-{
- return 0;
-}
diff --git a/arch/m68k/mvme16x/config.c b/arch/m68k/mvme16x/config.c
index 4ffd9ef..6ee36a5 100644
--- a/arch/m68k/mvme16x/config.c
+++ b/arch/m68k/mvme16x/config.c
@@ -46,7 +46,6 @@
extern void mvme16x_sched_init(irq_handler_t handler);
extern u32 mvme16x_gettimeoffset(void);
extern int mvme16x_hwclk (int, struct rtc_time *);
-extern int mvme16x_set_clock_mmss (unsigned long);
extern void mvme16x_reset (void);
int bcd2int (unsigned char b);
@@ -280,7 +279,6 @@
mach_init_IRQ = mvme16x_init_IRQ;
arch_gettimeoffset = mvme16x_gettimeoffset;
mach_hwclk = mvme16x_hwclk;
- mach_set_clock_mmss = mvme16x_set_clock_mmss;
mach_reset = mvme16x_reset;
mach_get_model = mvme16x_get_model;
mach_get_hardware_list = mvme16x_get_hardware_list;
@@ -411,9 +409,3 @@
}
return 0;
}
-
-int mvme16x_set_clock_mmss (unsigned long nowtime)
-{
- return 0;
-}
-
diff --git a/arch/m68k/q40/config.c b/arch/m68k/q40/config.c
index 71c0867..96810d9 100644
--- a/arch/m68k/q40/config.c
+++ b/arch/m68k/q40/config.c
@@ -43,7 +43,6 @@
static u32 q40_gettimeoffset(void);
static int q40_hwclk(int, struct rtc_time *);
static unsigned int q40_get_ss(void);
-static int q40_set_clock_mmss(unsigned long);
static int q40_get_rtc_pll(struct rtc_pll_info *pll);
static int q40_set_rtc_pll(struct rtc_pll_info *pll);
@@ -175,7 +174,6 @@
mach_get_ss = q40_get_ss;
mach_get_rtc_pll = q40_get_rtc_pll;
mach_set_rtc_pll = q40_set_rtc_pll;
- mach_set_clock_mmss = q40_set_clock_mmss;
mach_reset = q40_reset;
mach_get_model = q40_get_model;
@@ -267,34 +265,6 @@
return bcd2bin(Q40_RTC_SECS);
}
-/*
- * Set the minutes and seconds from seconds value 'nowtime'. Fail if
- * clock is out by > 30 minutes. Logic lifted from atari code.
- */
-
-static int q40_set_clock_mmss(unsigned long nowtime)
-{
- int retval = 0;
- short real_seconds = nowtime % 60, real_minutes = (nowtime / 60) % 60;
-
- int rtc_minutes;
-
- rtc_minutes = bcd2bin(Q40_RTC_MINS);
-
- if ((rtc_minutes < real_minutes ?
- real_minutes - rtc_minutes :
- rtc_minutes - real_minutes) < 30) {
- Q40_RTC_CTRL |= Q40_RTC_WRITE;
- Q40_RTC_MINS = bin2bcd(real_minutes);
- Q40_RTC_SECS = bin2bcd(real_seconds);
- Q40_RTC_CTRL &= ~(Q40_RTC_WRITE);
- } else
- retval = -1;
-
- return retval;
-}
-
-
/* get and set PLL calibration of RTC clock */
#define Q40_RTC_PLL_MASK ((1<<5)-1)
#define Q40_RTC_PLL_SIGN (1<<5)
diff --git a/arch/m68k/sun3/config.c b/arch/m68k/sun3/config.c
index 1d28d38..79a2bb8 100644
--- a/arch/m68k/sun3/config.c
+++ b/arch/m68k/sun3/config.c
@@ -123,10 +123,6 @@
availmem = memory_start;
m68k_setup_node(0);
- availmem += init_bootmem(start_page, num_pages);
- availmem = (availmem + (PAGE_SIZE-1)) & PAGE_MASK;
-
- free_bootmem(__pa(availmem), memory_end - (availmem));
}
diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 0ab176b..79be687 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -274,97 +274,12 @@
#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
-/**
- * __atomic_add_unless - add unless the number is a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
-static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
-
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-
-/*
- * atomic_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_sub_and_test(i, v) (atomic_sub_return((i), (v)) == 0)
-
-/*
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-
-/*
- * atomic_dec_and_test - decrement by 1 and test
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
-
/*
* atomic_dec_if_positive - decrement by 1 if old value positive
* @v: pointer of type atomic_t
*/
#define atomic_dec_if_positive(v) atomic_sub_if_positive(1, v)
-/*
- * atomic_inc - increment atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1.
- */
-#define atomic_inc(v) atomic_add(1, (v))
-
-/*
- * atomic_dec - decrement and test
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1.
- */
-#define atomic_dec(v) atomic_sub(1, (v))
-
-/*
- * atomic_add_negative - add and test if negative
- * @v: pointer of type atomic_t
- * @i: integer value to add
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
-#define atomic_add_negative(i, v) (atomic_add_return(i, (v)) < 0)
-
#ifdef CONFIG_64BIT
#define ATOMIC64_INIT(i) { (i) }
@@ -620,99 +535,12 @@
((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
#define atomic64_xchg(v, new) (xchg(&((v)->counter), (new)))
-/**
- * atomic64_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns true iff @v was not @u.
- */
-static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u)
-{
- long c, old;
- c = atomic64_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic64_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c != (u);
-}
-
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
-
-#define atomic64_dec_return(v) atomic64_sub_return(1, (v))
-#define atomic64_inc_return(v) atomic64_add_return(1, (v))
-
-/*
- * atomic64_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic64_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-#define atomic64_sub_and_test(i, v) (atomic64_sub_return((i), (v)) == 0)
-
-/*
- * atomic64_inc_and_test - increment and test
- * @v: pointer of type atomic64_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-
-/*
- * atomic64_dec_and_test - decrement by 1 and test
- * @v: pointer of type atomic64_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-#define atomic64_dec_and_test(v) (atomic64_sub_return(1, (v)) == 0)
-
/*
* atomic64_dec_if_positive - decrement by 1 if old value positive
* @v: pointer of type atomic64_t
*/
#define atomic64_dec_if_positive(v) atomic64_sub_if_positive(1, v)
-/*
- * atomic64_inc - increment atomic variable
- * @v: pointer of type atomic64_t
- *
- * Atomically increments @v by 1.
- */
-#define atomic64_inc(v) atomic64_add(1, (v))
-
-/*
- * atomic64_dec - decrement and test
- * @v: pointer of type atomic64_t
- *
- * Atomically decrements @v by 1.
- */
-#define atomic64_dec(v) atomic64_sub(1, (v))
-
-/*
- * atomic64_add_negative - add and test if negative
- * @v: pointer of type atomic64_t
- * @i: integer value to add
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
-#define atomic64_add_negative(i, v) (atomic64_add_return(i, (v)) < 0)
-
#endif /* CONFIG_64BIT */
#endif /* _ASM_ATOMIC_H */
diff --git a/arch/mips/include/asm/kprobes.h b/arch/mips/include/asm/kprobes.h
index ad1a999..a72dfbf 100644
--- a/arch/mips/include/asm/kprobes.h
+++ b/arch/mips/include/asm/kprobes.h
@@ -68,16 +68,6 @@
unsigned long saved_epc;
};
-#define MAX_JPROBES_STACK_SIZE 128
-#define MAX_JPROBES_STACK_ADDR \
- (((unsigned long)current_thread_info()) + THREAD_SIZE - 32 - sizeof(struct pt_regs))
-
-#define MIN_JPROBES_STACK_SIZE(ADDR) \
- ((((ADDR) + MAX_JPROBES_STACK_SIZE) > MAX_JPROBES_STACK_ADDR) \
- ? MAX_JPROBES_STACK_ADDR - (ADDR) \
- : MAX_JPROBES_STACK_SIZE)
-
-
#define SKIP_DELAYSLOT 0x0001
/* per-cpu kprobe control block */
@@ -86,12 +76,9 @@
unsigned long kprobe_old_SR;
unsigned long kprobe_saved_SR;
unsigned long kprobe_saved_epc;
- unsigned long jprobe_saved_sp;
- struct pt_regs jprobe_saved_regs;
/* Per-thread fields, used while emulating branches */
unsigned long flags;
unsigned long target_epc;
- u8 jprobes_stack[MAX_JPROBES_STACK_SIZE];
struct prev_kprobe prev_kprobe;
};
diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index f5c8bce..54cd675 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -326,19 +326,13 @@
preempt_enable_no_resched();
}
return 1;
- } else {
- if (addr->word != breakpoint_insn.word) {
- /*
- * The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
- ret = 1;
- goto no_kprobe;
- }
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs))
- goto ss_probe;
+ } else if (addr->word != breakpoint_insn.word) {
+ /*
+ * The breakpoint instruction was removed by
+ * another cpu right after we hit, no further
+ * handling of this interrupt is appropriate
+ */
+ ret = 1;
}
goto no_kprobe;
}
@@ -364,10 +358,11 @@
if (p->pre_handler && p->pre_handler(p, regs)) {
/* handler has already set things up, so skip ss setup */
+ reset_current_kprobe();
+ preempt_enable_no_resched();
return 1;
}
-ss_probe:
prepare_singlestep(p, regs, kcb);
if (kcb->flags & SKIP_DELAYSLOT) {
kcb->kprobe_status = KPROBE_HIT_SSDONE;
@@ -468,51 +463,6 @@
return ret;
}
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- kcb->jprobe_saved_regs = *regs;
- kcb->jprobe_saved_sp = regs->regs[29];
-
- memcpy(kcb->jprobes_stack, (void *)kcb->jprobe_saved_sp,
- MIN_JPROBES_STACK_SIZE(kcb->jprobe_saved_sp));
-
- regs->cp0_epc = (unsigned long)(jp->entry);
-
- return 1;
-}
-
-/* Defined in the inline asm below. */
-void jprobe_return_end(void);
-
-void __kprobes jprobe_return(void)
-{
- /* Assembler quirk necessitates this '0,code' business. */
- asm volatile(
- "break 0,%0\n\t"
- ".globl jprobe_return_end\n"
- "jprobe_return_end:\n"
- : : "n" (BRK_KPROBE_BP) : "memory");
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (regs->cp0_epc >= (unsigned long)jprobe_return &&
- regs->cp0_epc <= (unsigned long)jprobe_return_end) {
- *regs = kcb->jprobe_saved_regs;
- memcpy((void *)kcb->jprobe_saved_sp, kcb->jprobes_stack,
- MIN_JPROBES_STACK_SIZE(kcb->jprobe_saved_sp));
- preempt_enable_no_resched();
-
- return 1;
- }
- return 0;
-}
-
/*
* Function return probe trampoline:
* - init_kprobes() establishes a probepoint here
@@ -595,9 +545,7 @@
kretprobe_assert(ri, orig_ret_address, trampoline_address);
instruction_pointer(regs) = orig_ret_address;
- reset_current_kprobe();
kretprobe_hash_unlock(current, &flags);
- preempt_enable_no_resched();
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 7cd76f9..f7ea8e2 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -515,7 +515,7 @@
dvcpu->arch.wait = 0;
if (swq_has_sleeper(&dvcpu->wq))
- swake_up(&dvcpu->wq);
+ swake_up_one(&dvcpu->wq);
return 0;
}
@@ -1204,7 +1204,7 @@
vcpu->arch.wait = 0;
if (swq_has_sleeper(&vcpu->wq))
- swake_up(&vcpu->wq);
+ swake_up_one(&vcpu->wq);
}
/* low level hrtimer wake routine */
diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 9ecad05b..dfb6a79 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -27,7 +27,6 @@
select GENERIC_STRNLEN_USER
select GENERIC_SMP_IDLE_THREAD
select MODULES_USE_ELF_RELA
- select MULTI_IRQ_HANDLER
select HAVE_DEBUG_STACKOVERFLOW
select OR1K_PIC
select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
@@ -36,6 +35,7 @@
select ARCH_USE_QUEUED_RWLOCKS
select OMPIC if SMP
select ARCH_WANT_FRAME_POINTERS
+ select GENERIC_IRQ_MULTI_HANDLER
config CPU_BIG_ENDIAN
def_bool y
@@ -69,9 +69,6 @@
config LOCKDEP_SUPPORT
def_bool y
-config MULTI_IRQ_HANDLER
- def_bool y
-
source "init/Kconfig"
source "kernel/Kconfig.freezer"
diff --git a/arch/openrisc/include/asm/atomic.h b/arch/openrisc/include/asm/atomic.h
index 146e166..b589fac 100644
--- a/arch/openrisc/include/asm/atomic.h
+++ b/arch/openrisc/include/asm/atomic.h
@@ -100,7 +100,7 @@
*
* This is often used through atomic_inc_not_zero()
*/
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
+static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int old, tmp;
@@ -119,7 +119,7 @@
return old;
}
-#define __atomic_add_unless __atomic_add_unless
+#define atomic_fetch_add_unless atomic_fetch_add_unless
#include <asm-generic/atomic.h>
diff --git a/arch/openrisc/include/asm/cmpxchg.h b/arch/openrisc/include/asm/cmpxchg.h
index d29f7db5..f9cd43a 100644
--- a/arch/openrisc/include/asm/cmpxchg.h
+++ b/arch/openrisc/include/asm/cmpxchg.h
@@ -16,8 +16,9 @@
#ifndef __ASM_OPENRISC_CMPXCHG_H
#define __ASM_OPENRISC_CMPXCHG_H
+#include <linux/bits.h>
+#include <linux/compiler.h>
#include <linux/types.h>
-#include <linux/bitops.h>
#define __HAVE_ARCH_CMPXCHG 1
diff --git a/arch/openrisc/include/asm/irq.h b/arch/openrisc/include/asm/irq.h
index d9eee0a..eb612b1 100644
--- a/arch/openrisc/include/asm/irq.h
+++ b/arch/openrisc/include/asm/irq.h
@@ -24,6 +24,4 @@
#define NO_IRQ (-1)
-extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
-
#endif /* __ASM_OPENRISC_IRQ_H__ */
diff --git a/arch/openrisc/kernel/irq.c b/arch/openrisc/kernel/irq.c
index 35e478a..5f9445e 100644
--- a/arch/openrisc/kernel/irq.c
+++ b/arch/openrisc/kernel/irq.c
@@ -41,13 +41,6 @@
irqchip_init();
}
-static void (*handle_arch_irq)(struct pt_regs *);
-
-void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
-{
- handle_arch_irq = handle_irq;
-}
-
void __irq_entry do_IRQ(struct pt_regs *regs)
{
handle_arch_irq(regs);
diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 88bae66..118953d 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -77,30 +77,6 @@
#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
-/**
- * __atomic_add_unless - add unless the number is a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
-static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
-
#define ATOMIC_OP(op, c_op) \
static __inline__ void atomic_##op(int i, atomic_t *v) \
{ \
@@ -160,28 +136,6 @@
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
-#define atomic_inc(v) (atomic_add( 1,(v)))
-#define atomic_dec(v) (atomic_add( -1,(v)))
-
-#define atomic_inc_return(v) (atomic_add_return( 1,(v)))
-#define atomic_dec_return(v) (atomic_add_return( -1,(v)))
-
-#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
-
-/*
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-
-#define atomic_dec_and_test(v) (atomic_dec_return(v) == 0)
-
-#define atomic_sub_and_test(i,v) (atomic_sub_return((i),(v)) == 0)
-
#define ATOMIC_INIT(i) { (i) }
#ifdef CONFIG_64BIT
@@ -264,72 +218,11 @@
return READ_ONCE((v)->counter);
}
-#define atomic64_inc(v) (atomic64_add( 1,(v)))
-#define atomic64_dec(v) (atomic64_add( -1,(v)))
-
-#define atomic64_inc_return(v) (atomic64_add_return( 1,(v)))
-#define atomic64_dec_return(v) (atomic64_add_return( -1,(v)))
-
-#define atomic64_add_negative(a, v) (atomic64_add_return((a), (v)) < 0)
-
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-#define atomic64_dec_and_test(v) (atomic64_dec_return(v) == 0)
-#define atomic64_sub_and_test(i,v) (atomic64_sub_return((i),(v)) == 0)
-
/* exported interface */
#define atomic64_cmpxchg(v, o, n) \
((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
-/**
- * atomic64_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
-static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u)
-{
- long c, old;
- c = atomic64_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic64_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c != (u);
-}
-
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
-
-/*
- * atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic_t
- *
- * The function returns the old value of *v minus 1, even if
- * the atomic variable, v, was not decremented.
- */
-static inline long atomic64_dec_if_positive(atomic64_t *v)
-{
- long c, old, dec;
- c = atomic64_read(v);
- for (;;) {
- dec = c - 1;
- if (unlikely(dec < 0))
- break;
- old = atomic64_cmpxchg((v), c, dec);
- if (likely(old == c))
- break;
- c = old;
- }
- return dec;
-}
-
#endif /* !CONFIG_64BIT */
diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index 682b3e6..963abf8b 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -18,18 +18,11 @@
* a "bne-" instruction at the end, so an isync is enough as a acquire barrier
* on the platform without lwsync.
*/
-#define __atomic_op_acquire(op, args...) \
-({ \
- typeof(op##_relaxed(args)) __ret = op##_relaxed(args); \
- __asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory"); \
- __ret; \
-})
+#define __atomic_acquire_fence() \
+ __asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory")
-#define __atomic_op_release(op, args...) \
-({ \
- __asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory"); \
- op##_relaxed(args); \
-})
+#define __atomic_release_fence() \
+ __asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory")
static __inline__ int atomic_read(const atomic_t *v)
{
@@ -129,8 +122,6 @@
#undef ATOMIC_OP_RETURN_RELAXED
#undef ATOMIC_OP
-#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
-
static __inline__ void atomic_inc(atomic_t *v)
{
int t;
@@ -145,6 +136,7 @@
: "r" (&v->counter)
: "cc", "xer");
}
+#define atomic_inc atomic_inc
static __inline__ int atomic_inc_return_relaxed(atomic_t *v)
{
@@ -163,16 +155,6 @@
return t;
}
-/*
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-
static __inline__ void atomic_dec(atomic_t *v)
{
int t;
@@ -187,6 +169,7 @@
: "r" (&v->counter)
: "cc", "xer");
}
+#define atomic_dec atomic_dec
static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
{
@@ -218,7 +201,7 @@
#define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
/**
- * __atomic_add_unless - add unless the number is a given value
+ * atomic_fetch_add_unless - add unless the number is a given value
* @v: pointer of type atomic_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
@@ -226,13 +209,13 @@
* Atomically adds @a to @v, so long as it was not @u.
* Returns the old value of @v.
*/
-static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
+static __inline__ int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int t;
__asm__ __volatile__ (
PPC_ATOMIC_ENTRY_BARRIER
-"1: lwarx %0,0,%1 # __atomic_add_unless\n\
+"1: lwarx %0,0,%1 # atomic_fetch_add_unless\n\
cmpw 0,%0,%3 \n\
beq 2f \n\
add %0,%2,%0 \n"
@@ -248,6 +231,7 @@
return t;
}
+#define atomic_fetch_add_unless atomic_fetch_add_unless
/**
* atomic_inc_not_zero - increment unless the number is zero
@@ -280,9 +264,6 @@
}
#define atomic_inc_not_zero(v) atomic_inc_not_zero((v))
-#define atomic_sub_and_test(a, v) (atomic_sub_return((a), (v)) == 0)
-#define atomic_dec_and_test(v) (atomic_dec_return((v)) == 0)
-
/*
* Atomically test *v and decrement if it is greater than 0.
* The function returns the old value of *v minus 1, even if
@@ -412,8 +393,6 @@
#undef ATOMIC64_OP_RETURN_RELAXED
#undef ATOMIC64_OP
-#define atomic64_add_negative(a, v) (atomic64_add_return((a), (v)) < 0)
-
static __inline__ void atomic64_inc(atomic64_t *v)
{
long t;
@@ -427,6 +406,7 @@
: "r" (&v->counter)
: "cc", "xer");
}
+#define atomic64_inc atomic64_inc
static __inline__ long atomic64_inc_return_relaxed(atomic64_t *v)
{
@@ -444,16 +424,6 @@
return t;
}
-/*
- * atomic64_inc_and_test - increment and test
- * @v: pointer of type atomic64_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-
static __inline__ void atomic64_dec(atomic64_t *v)
{
long t;
@@ -467,6 +437,7 @@
: "r" (&v->counter)
: "cc", "xer");
}
+#define atomic64_dec atomic64_dec
static __inline__ long atomic64_dec_return_relaxed(atomic64_t *v)
{
@@ -487,9 +458,6 @@
#define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
#define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
-#define atomic64_sub_and_test(a, v) (atomic64_sub_return((a), (v)) == 0)
-#define atomic64_dec_and_test(v) (atomic64_dec_return((v)) == 0)
-
/*
* Atomically test *v and decrement if it is greater than 0.
* The function returns the old value of *v minus 1.
@@ -513,6 +481,7 @@
return t;
}
+#define atomic64_dec_if_positive atomic64_dec_if_positive
#define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
#define atomic64_cmpxchg_relaxed(v, o, n) \
@@ -524,7 +493,7 @@
#define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
/**
- * atomic64_add_unless - add unless the number is a given value
+ * atomic64_fetch_add_unless - add unless the number is a given value
* @v: pointer of type atomic64_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
@@ -532,13 +501,13 @@
* Atomically adds @a to @v, so long as it was not @u.
* Returns the old value of @v.
*/
-static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u)
+static __inline__ long atomic64_fetch_add_unless(atomic64_t *v, long a, long u)
{
long t;
__asm__ __volatile__ (
PPC_ATOMIC_ENTRY_BARRIER
-"1: ldarx %0,0,%1 # __atomic_add_unless\n\
+"1: ldarx %0,0,%1 # atomic64_fetch_add_unless\n\
cmpd 0,%0,%3 \n\
beq 2f \n\
add %0,%2,%0 \n"
@@ -551,8 +520,9 @@
: "r" (&v->counter), "r" (a), "r" (u)
: "cc", "memory");
- return t != u;
+ return t;
}
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
/**
* atomic_inc64_not_zero - increment unless the number is zero
@@ -582,6 +552,7 @@
return t1 != 0;
}
+#define atomic64_inc_not_zero(v) atomic64_inc_not_zero((v))
#endif /* __powerpc64__ */
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h
index 8e7b097..27d6e3c 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -52,6 +52,7 @@
#include <asm/reg.h>
#include <asm/debug.h>
+struct perf_event_attr;
struct perf_event;
struct pmu;
struct perf_sample_data;
@@ -60,8 +61,10 @@
extern int hw_breakpoint_slots(int type);
extern int arch_bp_generic_fields(int type, int *gen_bp_type);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw);
extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
int arch_install_hw_breakpoint(struct perf_event *bp);
diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
index 9f3be5c..785c464 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -88,7 +88,6 @@
struct kprobe_ctlblk {
unsigned long kprobe_status;
unsigned long kprobe_saved_msr;
- struct pt_regs jprobe_saved_regs;
struct prev_kprobe prev_kprobe;
};
@@ -103,17 +102,6 @@
extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
extern int kprobe_handler(struct pt_regs *regs);
extern int kprobe_post_handler(struct pt_regs *regs);
-#ifdef CONFIG_KPROBES_ON_FTRACE
-extern int __is_active_jprobe(unsigned long addr);
-extern int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb);
-#else
-static inline int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- return 0;
-}
-#endif
#else
static inline int kprobe_handler(struct pt_regs *regs) { return 0; }
static inline int kprobe_post_handler(struct pt_regs *regs) { return 0; }
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index 80547da..fec8a67 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -119,11 +119,9 @@
/*
* Check for virtual address in kernel space.
*/
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
- return is_kernel_addr(info->address);
+ return is_kernel_addr(hw->address);
}
int arch_bp_generic_fields(int type, int *gen_bp_type)
@@ -141,30 +139,31 @@
/*
* Validate the arch-specific HW Breakpoint register settings
*/
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
int ret = -EINVAL, length_max;
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
if (!bp)
return ret;
- info->type = HW_BRK_TYPE_TRANSLATE;
- if (bp->attr.bp_type & HW_BREAKPOINT_R)
- info->type |= HW_BRK_TYPE_READ;
- if (bp->attr.bp_type & HW_BREAKPOINT_W)
- info->type |= HW_BRK_TYPE_WRITE;
- if (info->type == HW_BRK_TYPE_TRANSLATE)
+ hw->type = HW_BRK_TYPE_TRANSLATE;
+ if (attr->bp_type & HW_BREAKPOINT_R)
+ hw->type |= HW_BRK_TYPE_READ;
+ if (attr->bp_type & HW_BREAKPOINT_W)
+ hw->type |= HW_BRK_TYPE_WRITE;
+ if (hw->type == HW_BRK_TYPE_TRANSLATE)
/* must set alteast read or write */
return ret;
- if (!(bp->attr.exclude_user))
- info->type |= HW_BRK_TYPE_USER;
- if (!(bp->attr.exclude_kernel))
- info->type |= HW_BRK_TYPE_KERNEL;
- if (!(bp->attr.exclude_hv))
- info->type |= HW_BRK_TYPE_HYP;
- info->address = bp->attr.bp_addr;
- info->len = bp->attr.bp_len;
+ if (!attr->exclude_user)
+ hw->type |= HW_BRK_TYPE_USER;
+ if (!attr->exclude_kernel)
+ hw->type |= HW_BRK_TYPE_KERNEL;
+ if (!attr->exclude_hv)
+ hw->type |= HW_BRK_TYPE_HYP;
+ hw->address = attr->bp_addr;
+ hw->len = attr->bp_len;
/*
* Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8)
@@ -178,12 +177,12 @@
if (cpu_has_feature(CPU_FTR_DAWR)) {
length_max = 512 ; /* 64 doublewords */
/* DAWR region can't cross 512 boundary */
- if ((bp->attr.bp_addr >> 9) !=
- ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 9))
+ if ((attr->bp_addr >> 9) !=
+ ((attr->bp_addr + attr->bp_len - 1) >> 9))
return -EINVAL;
}
- if (info->len >
- (length_max - (info->address & HW_BREAKPOINT_ALIGN)))
+ if (hw->len >
+ (length_max - (hw->address & HW_BREAKPOINT_ALIGN)))
return -EINVAL;
return 0;
}
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7a1f99f..e4a49c0 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -25,50 +25,6 @@
#include <linux/preempt.h>
#include <linux/ftrace.h>
-/*
- * This is called from ftrace code after invoking registered handlers to
- * disambiguate regs->nip changes done by jprobes and livepatch. We check if
- * there is an active jprobe at the provided address (mcount location).
- */
-int __is_active_jprobe(unsigned long addr)
-{
- if (!preemptible()) {
- struct kprobe *p = raw_cpu_read(current_kprobe);
- return (p && (unsigned long)p->addr == addr) ? 1 : 0;
- }
-
- return 0;
-}
-
-static nokprobe_inline
-int __skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb, unsigned long orig_nip)
-{
- /*
- * Emulate singlestep (and also recover regs->nip)
- * as if there is a nop
- */
- regs->nip = (unsigned long)p->addr + MCOUNT_INSN_SIZE;
- if (unlikely(p->post_handler)) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- p->post_handler(p, regs, 0);
- }
- __this_cpu_write(current_kprobe, NULL);
- if (orig_nip)
- regs->nip = orig_nip;
- return 1;
-}
-
-int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- if (kprobe_ftrace(p))
- return __skip_singlestep(p, regs, kcb, 0);
- else
- return 0;
-}
-NOKPROBE_SYMBOL(skip_singlestep);
-
/* Ftrace callback handler for kprobes */
void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
struct ftrace_ops *ops, struct pt_regs *regs)
@@ -76,18 +32,14 @@
struct kprobe *p;
struct kprobe_ctlblk *kcb;
- preempt_disable();
-
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
- goto end;
+ return;
kcb = get_kprobe_ctlblk();
if (kprobe_running()) {
kprobes_inc_nmissed_count(p);
} else {
- unsigned long orig_nip = regs->nip;
-
/*
* On powerpc, NIP is *before* this instruction for the
* pre handler
@@ -96,19 +48,23 @@
__this_cpu_write(current_kprobe, p);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- if (!p->pre_handler || !p->pre_handler(p, regs))
- __skip_singlestep(p, regs, kcb, orig_nip);
- else {
+ if (!p->pre_handler || !p->pre_handler(p, regs)) {
/*
- * If pre_handler returns !0, it sets regs->nip and
- * resets current kprobe. In this case, we should not
- * re-enable preemption.
+ * Emulate singlestep (and also recover regs->nip)
+ * as if there is a nop
*/
- return;
+ regs->nip += MCOUNT_INSN_SIZE;
+ if (unlikely(p->post_handler)) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ p->post_handler(p, regs, 0);
+ }
}
+ /*
+ * If pre_handler returns !0, it changes regs->nip. We have to
+ * skip emulating post_handler.
+ */
+ __this_cpu_write(current_kprobe, NULL);
}
-end:
- preempt_enable_no_resched();
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index e4c5bf3..5c60bb0 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -317,25 +317,17 @@
}
prepare_singlestep(p, regs);
return 1;
- } else {
- if (*addr != BREAKPOINT_INSTRUCTION) {
- /* If trap variant, then it belongs not to us */
- kprobe_opcode_t cur_insn = *addr;
- if (is_trap(cur_insn))
- goto no_kprobe;
- /* The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
- ret = 1;
+ } else if (*addr != BREAKPOINT_INSTRUCTION) {
+ /* If trap variant, then it belongs not to us */
+ kprobe_opcode_t cur_insn = *addr;
+
+ if (is_trap(cur_insn))
goto no_kprobe;
- }
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- if (!skip_singlestep(p, regs, kcb))
- goto ss_probe;
- ret = 1;
- }
+ /* The breakpoint instruction was removed by
+ * another cpu right after we hit, no further
+ * handling of this interrupt is appropriate
+ */
+ ret = 1;
}
goto no_kprobe;
}
@@ -350,7 +342,7 @@
*/
kprobe_opcode_t cur_insn = *addr;
if (is_trap(cur_insn))
- goto no_kprobe;
+ goto no_kprobe;
/*
* The breakpoint instruction was removed right
* after we hit it. Another cpu has removed
@@ -366,11 +358,13 @@
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
set_current_kprobe(p, regs, kcb);
- if (p->pre_handler && p->pre_handler(p, regs))
- /* handler has already set things up, so skip ss setup */
+ if (p->pre_handler && p->pre_handler(p, regs)) {
+ /* handler changed execution path, so skip ss setup */
+ reset_current_kprobe();
+ preempt_enable_no_resched();
return 1;
+ }
-ss_probe:
if (p->ainsn.boostable >= 0) {
ret = try_to_emulate(p, regs);
@@ -611,60 +605,6 @@
}
NOKPROBE_SYMBOL(arch_deref_entry_point);
-int setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- memcpy(&kcb->jprobe_saved_regs, regs, sizeof(struct pt_regs));
-
- /* setup return addr to the jprobe handler routine */
- regs->nip = arch_deref_entry_point(jp->entry);
-#ifdef PPC64_ELF_ABI_v2
- regs->gpr[12] = (unsigned long)jp->entry;
-#elif defined(PPC64_ELF_ABI_v1)
- regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc);
-#endif
-
- /*
- * jprobes use jprobe_return() which skips the normal return
- * path of the function, and this messes up the accounting of the
- * function graph tracer.
- *
- * Pause function graph tracing while performing the jprobe function.
- */
- pause_graph_tracing();
-
- return 1;
-}
-NOKPROBE_SYMBOL(setjmp_pre_handler);
-
-void __used jprobe_return(void)
-{
- asm volatile("jprobe_return_trap:\n"
- "trap\n"
- ::: "memory");
-}
-NOKPROBE_SYMBOL(jprobe_return);
-
-int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (regs->nip != ppc_kallsyms_lookup_name("jprobe_return_trap")) {
- pr_debug("longjmp_break_handler NIP (0x%lx) does not match jprobe_return_trap (0x%lx)\n",
- regs->nip, ppc_kallsyms_lookup_name("jprobe_return_trap"));
- return 0;
- }
-
- memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
- /* It's OK to start function graph tracing again */
- unpause_graph_tracing();
- preempt_enable_no_resched();
- return 1;
-}
-NOKPROBE_SYMBOL(longjmp_break_handler);
-
static struct kprobe trampoline_p = {
.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
.pre_handler = trampoline_probe_handler
diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
index 9a5b5a5..32476a6 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
@@ -104,39 +104,13 @@
bl ftrace_stub
nop
- /* Load the possibly modified NIP */
- ld r15, _NIP(r1)
-
+ /* Load ctr with the possibly modified NIP */
+ ld r3, _NIP(r1)
+ mtctr r3
#ifdef CONFIG_LIVEPATCH
- cmpd r14, r15 /* has NIP been altered? */
+ cmpd r14, r3 /* has NIP been altered? */
#endif
-#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_KPROBES_ON_FTRACE)
- /* NIP has not been altered, skip over further checks */
- beq 1f
-
- /* Check if there is an active jprobe on us */
- subi r3, r14, 4
- bl __is_active_jprobe
- nop
-
- /*
- * If r3 == 1, then this is a kprobe/jprobe.
- * else, this is livepatched function.
- *
- * The conditional branch for livepatch_handler below will use the
- * result of this comparison. For kprobe/jprobe, we just need to branch to
- * the new NIP, not call livepatch_handler. The branch below is bne, so we
- * want CR0[EQ] to be true if this is a kprobe/jprobe. Which means we want
- * CR0[EQ] = (r3 == 1).
- */
- cmpdi r3, 1
-1:
-#endif
-
- /* Load CTR with the possibly modified NIP */
- mtctr r15
-
/* Restore gprs */
REST_GPR(0,r1)
REST_10GPRS(2,r1)
@@ -154,10 +128,7 @@
addi r1, r1, SWITCH_FRAME_SIZE
#ifdef CONFIG_LIVEPATCH
- /*
- * Based on the cmpd or cmpdi above, if the NIP was altered and we're
- * not on a kprobe/jprobe, then handle livepatch.
- */
+ /* Based on the cmpd above, if the NIP was altered handle livepatch */
bne- livepatch_handler
#endif
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de686b3..ee4a885 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -216,7 +216,7 @@
wqp = kvm_arch_vcpu_wq(vcpu);
if (swq_has_sleeper(wqp)) {
- swake_up(wqp);
+ swake_up_one(wqp);
++vcpu->stat.halt_wakeup;
}
@@ -3188,7 +3188,7 @@
}
}
- prepare_to_swait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_swait_exclusive(&vc->wq, &wait, TASK_INTERRUPTIBLE);
if (kvmppc_vcore_check_block(vc)) {
finish_swait(&vc->wq, &wait);
@@ -3311,7 +3311,7 @@
kvmppc_start_thread(vcpu, vc);
trace_kvm_guest_enter(vcpu);
} else if (vc->vcore_state == VCORE_SLEEPING) {
- swake_up(&vc->wq);
+ swake_up_one(&vc->wq);
}
}
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3f66fcf..19d8ab4 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1469,7 +1469,7 @@
}
/*
- * Add a event to the PMU.
+ * Add an event to the PMU.
* If all events are not already frozen, then we disable and
* re-enable the PMU in order to get hw_perf_enable to do the
* actual work of reconfiguring the PMU.
@@ -1548,7 +1548,7 @@
}
/*
- * Remove a event from the PMU.
+ * Remove an event from the PMU.
*/
static void power_pmu_del(struct perf_event *event, int ef_flags)
{
@@ -1742,7 +1742,7 @@
/*
* Return 1 if we might be able to put event on a limited PMC,
* or 0 if not.
- * A event can only go on a limited PMC if it counts something
+ * An event can only go on a limited PMC if it counts something
* that a limited PMC can count, doesn't require interrupts, and
* doesn't exclude any processor mode.
*/
diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 855115a..c452359 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -25,18 +25,11 @@
#define ATOMIC_INIT(i) { (i) }
-#define __atomic_op_acquire(op, args...) \
-({ \
- typeof(op##_relaxed(args)) __ret = op##_relaxed(args); \
- __asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory"); \
- __ret; \
-})
+#define __atomic_acquire_fence() \
+ __asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory")
-#define __atomic_op_release(op, args...) \
-({ \
- __asm__ __volatile__(RISCV_RELEASE_BARRIER "" ::: "memory"); \
- op##_relaxed(args); \
-})
+#define __atomic_release_fence() \
+ __asm__ __volatile__(RISCV_RELEASE_BARRIER "" ::: "memory");
static __always_inline int atomic_read(const atomic_t *v)
{
@@ -209,130 +202,8 @@
#undef ATOMIC_FETCH_OP
#undef ATOMIC_OP_RETURN
-/*
- * The extra atomic operations that are constructed from one of the core
- * AMO-based operations above (aside from sub, which is easier to fit above).
- * These are required to perform a full barrier, but they're OK this way
- * because atomic_*_return is also required to perform a full barrier.
- *
- */
-#define ATOMIC_OP(op, func_op, comp_op, I, c_type, prefix) \
-static __always_inline \
-bool atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
-{ \
- return atomic##prefix##_##func_op##_return(i, v) comp_op I; \
-}
-
-#ifdef CONFIG_GENERIC_ATOMIC64
-#define ATOMIC_OPS(op, func_op, comp_op, I) \
- ATOMIC_OP(op, func_op, comp_op, I, int, )
-#else
-#define ATOMIC_OPS(op, func_op, comp_op, I) \
- ATOMIC_OP(op, func_op, comp_op, I, int, ) \
- ATOMIC_OP(op, func_op, comp_op, I, long, 64)
-#endif
-
-ATOMIC_OPS(add_and_test, add, ==, 0)
-ATOMIC_OPS(sub_and_test, sub, ==, 0)
-ATOMIC_OPS(add_negative, add, <, 0)
-
-#undef ATOMIC_OP
-#undef ATOMIC_OPS
-
-#define ATOMIC_OP(op, func_op, I, c_type, prefix) \
-static __always_inline \
-void atomic##prefix##_##op(atomic##prefix##_t *v) \
-{ \
- atomic##prefix##_##func_op(I, v); \
-}
-
-#define ATOMIC_FETCH_OP(op, func_op, I, c_type, prefix) \
-static __always_inline \
-c_type atomic##prefix##_fetch_##op##_relaxed(atomic##prefix##_t *v) \
-{ \
- return atomic##prefix##_fetch_##func_op##_relaxed(I, v); \
-} \
-static __always_inline \
-c_type atomic##prefix##_fetch_##op(atomic##prefix##_t *v) \
-{ \
- return atomic##prefix##_fetch_##func_op(I, v); \
-}
-
-#define ATOMIC_OP_RETURN(op, asm_op, c_op, I, c_type, prefix) \
-static __always_inline \
-c_type atomic##prefix##_##op##_return_relaxed(atomic##prefix##_t *v) \
-{ \
- return atomic##prefix##_fetch_##op##_relaxed(v) c_op I; \
-} \
-static __always_inline \
-c_type atomic##prefix##_##op##_return(atomic##prefix##_t *v) \
-{ \
- return atomic##prefix##_fetch_##op(v) c_op I; \
-}
-
-#ifdef CONFIG_GENERIC_ATOMIC64
-#define ATOMIC_OPS(op, asm_op, c_op, I) \
- ATOMIC_OP( op, asm_op, I, int, ) \
- ATOMIC_FETCH_OP( op, asm_op, I, int, ) \
- ATOMIC_OP_RETURN(op, asm_op, c_op, I, int, )
-#else
-#define ATOMIC_OPS(op, asm_op, c_op, I) \
- ATOMIC_OP( op, asm_op, I, int, ) \
- ATOMIC_FETCH_OP( op, asm_op, I, int, ) \
- ATOMIC_OP_RETURN(op, asm_op, c_op, I, int, ) \
- ATOMIC_OP( op, asm_op, I, long, 64) \
- ATOMIC_FETCH_OP( op, asm_op, I, long, 64) \
- ATOMIC_OP_RETURN(op, asm_op, c_op, I, long, 64)
-#endif
-
-ATOMIC_OPS(inc, add, +, 1)
-ATOMIC_OPS(dec, add, +, -1)
-
-#define atomic_inc_return_relaxed atomic_inc_return_relaxed
-#define atomic_dec_return_relaxed atomic_dec_return_relaxed
-#define atomic_inc_return atomic_inc_return
-#define atomic_dec_return atomic_dec_return
-
-#define atomic_fetch_inc_relaxed atomic_fetch_inc_relaxed
-#define atomic_fetch_dec_relaxed atomic_fetch_dec_relaxed
-#define atomic_fetch_inc atomic_fetch_inc
-#define atomic_fetch_dec atomic_fetch_dec
-
-#ifndef CONFIG_GENERIC_ATOMIC64
-#define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
-#define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
-#define atomic64_inc_return atomic64_inc_return
-#define atomic64_dec_return atomic64_dec_return
-
-#define atomic64_fetch_inc_relaxed atomic64_fetch_inc_relaxed
-#define atomic64_fetch_dec_relaxed atomic64_fetch_dec_relaxed
-#define atomic64_fetch_inc atomic64_fetch_inc
-#define atomic64_fetch_dec atomic64_fetch_dec
-#endif
-
-#undef ATOMIC_OPS
-#undef ATOMIC_OP
-#undef ATOMIC_FETCH_OP
-#undef ATOMIC_OP_RETURN
-
-#define ATOMIC_OP(op, func_op, comp_op, I, prefix) \
-static __always_inline \
-bool atomic##prefix##_##op(atomic##prefix##_t *v) \
-{ \
- return atomic##prefix##_##func_op##_return(v) comp_op I; \
-}
-
-ATOMIC_OP(inc_and_test, inc, ==, 0, )
-ATOMIC_OP(dec_and_test, dec, ==, 0, )
-#ifndef CONFIG_GENERIC_ATOMIC64
-ATOMIC_OP(inc_and_test, inc, ==, 0, 64)
-ATOMIC_OP(dec_and_test, dec, ==, 0, 64)
-#endif
-
-#undef ATOMIC_OP
-
/* This is required to provide a full barrier on success. */
-static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
+static __always_inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int prev, rc;
@@ -349,9 +220,10 @@
: "memory");
return prev;
}
+#define atomic_fetch_add_unless atomic_fetch_add_unless
#ifndef CONFIG_GENERIC_ATOMIC64
-static __always_inline long __atomic64_add_unless(atomic64_t *v, long a, long u)
+static __always_inline long atomic64_fetch_add_unless(atomic64_t *v, long a, long u)
{
long prev, rc;
@@ -368,27 +240,7 @@
: "memory");
return prev;
}
-
-static __always_inline int atomic64_add_unless(atomic64_t *v, long a, long u)
-{
- return __atomic64_add_unless(v, a, u) != u;
-}
-#endif
-
-/*
- * The extra atomic operations that are constructed from one of the core
- * LR/SC-based operations above.
- */
-static __always_inline int atomic_inc_not_zero(atomic_t *v)
-{
- return __atomic_add_unless(v, 1, 0);
-}
-
-#ifndef CONFIG_GENERIC_ATOMIC64
-static __always_inline long atomic64_inc_not_zero(atomic64_t *v)
-{
- return atomic64_add_unless(v, 1, 0);
-}
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
#endif
/*
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 4fe5b2a..5152405 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -145,6 +145,7 @@
select HAVE_KERNEL_LZ4
select HAVE_KERNEL_LZMA
select HAVE_KERNEL_LZO
+ select HAVE_KERNEL_UNCOMPRESSED
select HAVE_KERNEL_XZ
select HAVE_KPROBES
select HAVE_KRETPROBES
diff --git a/arch/s390/Makefile b/arch/s390/Makefile
index 68a6904..eee6703 100644
--- a/arch/s390/Makefile
+++ b/arch/s390/Makefile
@@ -14,8 +14,18 @@
LDFLAGS := -m elf64_s390
KBUILD_AFLAGS_MODULE += -fPIC
KBUILD_CFLAGS_MODULE += -fPIC
-KBUILD_CFLAGS += -m64
KBUILD_AFLAGS += -m64
+KBUILD_CFLAGS += -m64
+aflags_dwarf := -Wa,-gdwarf-2
+KBUILD_AFLAGS_DECOMPRESSOR := -m64 -D__ASSEMBLY__
+KBUILD_AFLAGS_DECOMPRESSOR += $(if $(CONFIG_DEBUG_INFO),$(aflags_dwarf))
+KBUILD_CFLAGS_DECOMPRESSOR := -m64 -O2
+KBUILD_CFLAGS_DECOMPRESSOR += -DDISABLE_BRANCH_PROFILING -D__NO_FORTIFY
+KBUILD_CFLAGS_DECOMPRESSOR += -fno-delete-null-pointer-checks -msoft-float
+KBUILD_CFLAGS_DECOMPRESSOR += -fno-asynchronous-unwind-tables
+KBUILD_CFLAGS_DECOMPRESSOR += $(call cc-option,-ffreestanding)
+KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_DEBUG_INFO),-g)
+KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_DEBUG_INFO_DWARF4), $(call cc-option, -gdwarf-4,))
UTS_MACHINE := s390x
STACK_SIZE := 16384
CHECKFLAGS += -D__s390__ -D__s390x__
@@ -52,18 +62,14 @@
#
cflags-$(CONFIG_FRAME_POINTER) += -fno-optimize-sibling-calls
-# old style option for packed stacks
-ifeq ($(call cc-option-yn,-mkernel-backchain),y)
-cflags-$(CONFIG_PACK_STACK) += -mkernel-backchain -D__PACK_STACK
-aflags-$(CONFIG_PACK_STACK) += -D__PACK_STACK
-endif
-
-# new style option for packed stacks
ifeq ($(call cc-option-yn,-mpacked-stack),y)
cflags-$(CONFIG_PACK_STACK) += -mpacked-stack -D__PACK_STACK
aflags-$(CONFIG_PACK_STACK) += -D__PACK_STACK
endif
+KBUILD_AFLAGS_DECOMPRESSOR += $(aflags-y)
+KBUILD_CFLAGS_DECOMPRESSOR += $(cflags-y)
+
ifeq ($(call cc-option-yn,-mstack-size=8192 -mstack-guard=128),y)
cflags-$(CONFIG_CHECK_STACK) += -mstack-size=$(STACK_SIZE)
ifneq ($(call cc-option-yn,-mstack-size=8192),y)
@@ -71,8 +77,11 @@
endif
endif
-ifeq ($(call cc-option-yn,-mwarn-dynamicstack),y)
-cflags-$(CONFIG_WARN_DYNAMIC_STACK) += -mwarn-dynamicstack
+ifdef CONFIG_WARN_DYNAMIC_STACK
+ ifeq ($(call cc-option-yn,-mwarn-dynamicstack),y)
+ KBUILD_CFLAGS += -mwarn-dynamicstack
+ KBUILD_CFLAGS_DECOMPRESSOR += -mwarn-dynamicstack
+ endif
endif
ifdef CONFIG_EXPOLINE
@@ -82,6 +91,7 @@
CC_FLAGS_EXPOLINE += -mindirect-branch-table
export CC_FLAGS_EXPOLINE
cflags-y += $(CC_FLAGS_EXPOLINE) -DCC_USING_EXPOLINE
+ aflags-y += -DCC_USING_EXPOLINE
endif
endif
@@ -102,11 +112,12 @@
KBUILD_CFLAGS += -pipe -fno-strength-reduce -Wno-sign-compare
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables $(cfi)
KBUILD_AFLAGS += $(aflags-y) $(cfi)
+export KBUILD_AFLAGS_DECOMPRESSOR
+export KBUILD_CFLAGS_DECOMPRESSOR
OBJCOPYFLAGS := -O binary
-head-y := arch/s390/kernel/head.o
-head-y += arch/s390/kernel/head64.o
+head-y := arch/s390/kernel/head64.o
# See arch/s390/Kbuild for content of core part of the kernel
core-y += arch/s390/
@@ -121,7 +132,7 @@
syscalls := arch/s390/kernel/syscalls
tools := arch/s390/tools
-all: image bzImage
+all: bzImage
#KBUILD_IMAGE is necessary for packaging targets like rpm-pkg, deb-pkg...
KBUILD_IMAGE := $(boot)/bzImage
@@ -129,7 +140,7 @@
install: vmlinux
$(Q)$(MAKE) $(build)=$(boot) $@
-image bzImage: vmlinux
+bzImage: vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
zfcpdump:
@@ -152,8 +163,7 @@
# Don't use tabs in echo arguments
define archhelp
- echo '* image - Kernel image for IPL ($(boot)/image)'
- echo '* bzImage - Compressed kernel image for IPL ($(boot)/bzImage)'
+ echo '* bzImage - Kernel image for IPL ($(boot)/bzImage)'
echo ' install - Install kernel using'
echo ' (your) ~/bin/$(INSTALLKERNEL) or'
echo ' (distribution) /sbin/$(INSTALLKERNEL) or'
diff --git a/arch/s390/appldata/appldata_base.c b/arch/s390/appldata/appldata_base.c
index ee6a9c3..9bf8489 100644
--- a/arch/s390/appldata/appldata_base.c
+++ b/arch/s390/appldata/appldata_base.c
@@ -206,35 +206,28 @@
appldata_timer_handler(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
- unsigned int len;
- char buf[2];
+ int timer_active = appldata_timer_active;
+ int zero = 0;
+ int one = 1;
+ int rc;
+ struct ctl_table ctl_entry = {
+ .procname = ctl->procname,
+ .data = &timer_active,
+ .maxlen = sizeof(int),
+ .extra1 = &zero,
+ .extra2 = &one,
+ };
- if (!*lenp || *ppos) {
- *lenp = 0;
- return 0;
- }
- if (!write) {
- strncpy(buf, appldata_timer_active ? "1\n" : "0\n",
- ARRAY_SIZE(buf));
- len = strnlen(buf, ARRAY_SIZE(buf));
- if (len > *lenp)
- len = *lenp;
- if (copy_to_user(buffer, buf, len))
- return -EFAULT;
- goto out;
- }
- len = *lenp;
- if (copy_from_user(buf, buffer, len > sizeof(buf) ? sizeof(buf) : len))
- return -EFAULT;
+ rc = proc_douintvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
+ if (rc < 0 || !write)
+ return rc;
+
spin_lock(&appldata_timer_lock);
- if (buf[0] == '1')
+ if (timer_active)
__appldata_vtimer_setup(APPLDATA_ADD_TIMER);
- else if (buf[0] == '0')
+ else
__appldata_vtimer_setup(APPLDATA_DEL_TIMER);
spin_unlock(&appldata_timer_lock);
-out:
- *lenp = len;
- *ppos += len;
return 0;
}
@@ -248,37 +241,24 @@
appldata_interval_handler(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
- unsigned int len;
- int interval;
- char buf[16];
+ int interval = appldata_interval;
+ int one = 1;
+ int rc;
+ struct ctl_table ctl_entry = {
+ .procname = ctl->procname,
+ .data = &interval,
+ .maxlen = sizeof(int),
+ .extra1 = &one,
+ };
- if (!*lenp || *ppos) {
- *lenp = 0;
- return 0;
- }
- if (!write) {
- len = sprintf(buf, "%i\n", appldata_interval);
- if (len > *lenp)
- len = *lenp;
- if (copy_to_user(buffer, buf, len))
- return -EFAULT;
- goto out;
- }
- len = *lenp;
- if (copy_from_user(buf, buffer, len > sizeof(buf) ? sizeof(buf) : len))
- return -EFAULT;
- interval = 0;
- sscanf(buf, "%i", &interval);
- if (interval <= 0)
- return -EINVAL;
+ rc = proc_dointvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
+ if (rc < 0 || !write)
+ return rc;
spin_lock(&appldata_timer_lock);
appldata_interval = interval;
__appldata_vtimer_setup(APPLDATA_MOD_TIMER);
spin_unlock(&appldata_timer_lock);
-out:
- *lenp = len;
- *ppos += len;
return 0;
}
@@ -293,10 +273,17 @@
void __user *buffer, size_t *lenp, loff_t *ppos)
{
struct appldata_ops *ops = NULL, *tmp_ops;
- unsigned int len;
- int rc, found;
- char buf[2];
struct list_head *lh;
+ int rc, found;
+ int active;
+ int zero = 0;
+ int one = 1;
+ struct ctl_table ctl_entry = {
+ .data = &active,
+ .maxlen = sizeof(int),
+ .extra1 = &zero,
+ .extra2 = &one,
+ };
found = 0;
mutex_lock(&appldata_ops_mutex);
@@ -317,31 +304,15 @@
}
mutex_unlock(&appldata_ops_mutex);
- if (!*lenp || *ppos) {
- *lenp = 0;
+ active = ops->active;
+ rc = proc_douintvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
+ if (rc < 0 || !write) {
module_put(ops->owner);
- return 0;
- }
- if (!write) {
- strncpy(buf, ops->active ? "1\n" : "0\n", ARRAY_SIZE(buf));
- len = strnlen(buf, ARRAY_SIZE(buf));
- if (len > *lenp)
- len = *lenp;
- if (copy_to_user(buffer, buf, len)) {
- module_put(ops->owner);
- return -EFAULT;
- }
- goto out;
- }
- len = *lenp;
- if (copy_from_user(buf, buffer,
- len > sizeof(buf) ? sizeof(buf) : len)) {
- module_put(ops->owner);
- return -EFAULT;
+ return rc;
}
mutex_lock(&appldata_ops_mutex);
- if ((buf[0] == '1') && (ops->active == 0)) {
+ if (active && (ops->active == 0)) {
// protect work queue callback
if (!try_module_get(ops->owner)) {
mutex_unlock(&appldata_ops_mutex);
@@ -359,7 +330,7 @@
module_put(ops->owner);
} else
ops->active = 1;
- } else if ((buf[0] == '0') && (ops->active == 1)) {
+ } else if (!active && (ops->active == 1)) {
ops->active = 0;
rc = appldata_diag(ops->record_nr, APPLDATA_STOP_REC,
(unsigned long) ops->data, ops->size,
@@ -370,9 +341,6 @@
module_put(ops->owner);
}
mutex_unlock(&appldata_ops_mutex);
-out:
- *lenp = len;
- *ppos += len;
module_put(ops->owner);
return 0;
}
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index d1fa37f..9e6668e 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -3,19 +3,52 @@
# Makefile for the linux s390-specific parts of the memory manager.
#
-targets := image
-targets += bzImage
-subdir- := compressed
+KCOV_INSTRUMENT := n
+GCOV_PROFILE := n
+UBSAN_SANITIZE := n
-$(obj)/image: vmlinux FORCE
- $(call if_changed,objcopy)
+KBUILD_AFLAGS := $(KBUILD_AFLAGS_DECOMPRESSOR)
+KBUILD_CFLAGS := $(KBUILD_CFLAGS_DECOMPRESSOR)
+
+#
+# Use -march=z900 for als.c to be able to print an error
+# message if the kernel is started on a machine which is too old
+#
+ifneq ($(CC_FLAGS_MARCH),-march=z900)
+AFLAGS_REMOVE_head.o += $(CC_FLAGS_MARCH)
+AFLAGS_head.o += -march=z900
+AFLAGS_REMOVE_mem.o += $(CC_FLAGS_MARCH)
+AFLAGS_mem.o += -march=z900
+CFLAGS_REMOVE_als.o += $(CC_FLAGS_MARCH)
+CFLAGS_als.o += -march=z900
+CFLAGS_REMOVE_sclp_early_core.o += $(CC_FLAGS_MARCH)
+CFLAGS_sclp_early_core.o += -march=z900
+endif
+
+CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
+
+obj-y := head.o als.o ebcdic.o sclp_early_core.o mem.o
+targets := bzImage startup.a $(obj-y)
+subdir- := compressed
+
+OBJECTS := $(addprefix $(obj)/,$(obj-y))
$(obj)/bzImage: $(obj)/compressed/vmlinux FORCE
$(call if_changed,objcopy)
-$(obj)/compressed/vmlinux: FORCE
+$(obj)/compressed/vmlinux: $(obj)/startup.a FORCE
$(Q)$(MAKE) $(build)=$(obj)/compressed $@
+quiet_cmd_ar = AR $@
+ cmd_ar = rm -f $@; $(AR) rcsTP$(KBUILD_ARFLAGS) $@ $(filter $(OBJECTS), $^)
+
+$(obj)/startup.a: $(OBJECTS) FORCE
+ $(call if_changed,ar)
+
install: $(CONFIGURE) $(obj)/bzImage
sh -x $(srctree)/$(obj)/install.sh $(KERNELRELEASE) $(obj)/bzImage \
System.map "$(INSTALL_PATH)"
+
+chkbss := $(OBJECTS)
+chkbss-target := $(obj)/startup.a
+include $(srctree)/arch/s390/scripts/Makefile.chkbss
diff --git a/arch/s390/kernel/als.c b/arch/s390/boot/als.c
similarity index 83%
rename from arch/s390/kernel/als.c
rename to arch/s390/boot/als.c
index d1892bf..d592e0d 100644
--- a/arch/s390/kernel/als.c
+++ b/arch/s390/boot/als.c
@@ -3,12 +3,10 @@
* Copyright IBM Corp. 2016
*/
#include <linux/kernel.h>
-#include <linux/init.h>
#include <asm/processor.h>
#include <asm/facility.h>
#include <asm/lowcore.h>
#include <asm/sclp.h>
-#include "entry.h"
/*
* The code within this file will be called very early. It may _not_
@@ -18,9 +16,9 @@
* For temporary objects the stack (16k) should be used.
*/
-static unsigned long als[] __initdata = { FACILITIES_ALS };
+static unsigned long als[] = { FACILITIES_ALS };
-static void __init u16_to_hex(char *str, u16 val)
+static void u16_to_hex(char *str, u16 val)
{
int i, num;
@@ -33,9 +31,9 @@
*str = '\0';
}
-static void __init print_machine_type(void)
+static void print_machine_type(void)
{
- static char mach_str[80] __initdata = "Detected machine-type number: ";
+ static char mach_str[80] = "Detected machine-type number: ";
char type_str[5];
struct cpuid id;
@@ -46,7 +44,7 @@
sclp_early_printk(mach_str);
}
-static void __init u16_to_decimal(char *str, u16 val)
+static void u16_to_decimal(char *str, u16 val)
{
int div = 1;
@@ -60,9 +58,9 @@
*str = '\0';
}
-static void __init print_missing_facilities(void)
+static void print_missing_facilities(void)
{
- static char als_str[80] __initdata = "Missing facilities: ";
+ static char als_str[80] = "Missing facilities: ";
unsigned long val;
char val_str[6];
int i, j, first;
@@ -95,7 +93,7 @@
sclp_early_printk("See Principles of Operations for facility bits\n");
}
-static void __init facility_mismatch(void)
+static void facility_mismatch(void)
{
sclp_early_printk("The Linux kernel requires more recent processor hardware\n");
print_machine_type();
@@ -103,7 +101,7 @@
disabled_wait(0x8badcccc);
}
-void __init verify_facilities(void)
+void verify_facilities(void)
{
int i;
diff --git a/arch/s390/boot/compressed/.gitignore b/arch/s390/boot/compressed/.gitignore
index 2088cc1..45aeb4f 100644
--- a/arch/s390/boot/compressed/.gitignore
+++ b/arch/s390/boot/compressed/.gitignore
@@ -1,4 +1,5 @@
sizes.h
vmlinux
vmlinux.lds
+vmlinux.scr.lds
vmlinux.bin.full
diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index 5766f7b..0460947 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -6,39 +6,29 @@
#
KCOV_INSTRUMENT := n
-
-targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
-targets += vmlinux.bin.xz vmlinux.bin.lzma vmlinux.bin.lzo vmlinux.bin.lz4
-targets += misc.o piggy.o sizes.h head.o
-
-KBUILD_CFLAGS := -m64 -D__KERNEL__ -O2
-KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING -D__NO_FORTIFY
-KBUILD_CFLAGS += $(cflags-y) -fno-delete-null-pointer-checks -msoft-float
-KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
-KBUILD_CFLAGS += $(call cc-option,-mpacked-stack)
-KBUILD_CFLAGS += $(call cc-option,-ffreestanding)
-
GCOV_PROFILE := n
UBSAN_SANITIZE := n
-OBJECTS := $(addprefix $(objtree)/arch/s390/kernel/, head.o ebcdic.o als.o)
-OBJECTS += $(objtree)/drivers/s390/char/sclp_early_core.o
-OBJECTS += $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o
+obj-y := $(if $(CONFIG_KERNEL_UNCOMPRESSED),,head.o misc.o) piggy.o
+targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
+targets += vmlinux.bin.xz vmlinux.bin.lzma vmlinux.bin.lzo vmlinux.bin.lz4
+targets += vmlinux.scr.lds $(obj-y) $(if $(CONFIG_KERNEL_UNCOMPRESSED),,sizes.h)
+
+KBUILD_AFLAGS := $(KBUILD_AFLAGS_DECOMPRESSOR)
+KBUILD_CFLAGS := $(KBUILD_CFLAGS_DECOMPRESSOR)
+
+OBJECTS := $(addprefix $(obj)/,$(obj-y))
LDFLAGS_vmlinux := --oformat $(LD_BFD) -e startup -T
-$(obj)/vmlinux: $(obj)/vmlinux.lds $(OBJECTS)
+$(obj)/vmlinux: $(obj)/vmlinux.lds $(objtree)/arch/s390/boot/startup.a $(OBJECTS)
$(call if_changed,ld)
-TRIM_HEAD_SIZE := 0x11000
-
-sed-sizes := -e 's/^\([0-9a-fA-F]*\) . \(__bss_start\|_end\)$$/\#define SZ\2 (0x\1 - $(TRIM_HEAD_SIZE))/p'
+# extract required uncompressed vmlinux symbols and adjust them to reflect offsets inside vmlinux.bin
+sed-sizes := -e 's/^\([0-9a-fA-F]*\) . \(__bss_start\|_end\)$$/\#define SZ\2 (0x\1 - 0x100000)/p'
quiet_cmd_sizes = GEN $@
cmd_sizes = $(NM) $< | sed -n $(sed-sizes) > $@
-quiet_cmd_trim_head = TRIM $@
- cmd_trim_head = tail -c +$$(($(TRIM_HEAD_SIZE) + 1)) $< > $@
-
$(obj)/sizes.h: vmlinux
$(call if_changed,sizes)
@@ -48,21 +38,18 @@
CFLAGS_misc.o += -I$(objtree)/$(obj)
$(obj)/misc.o: $(obj)/sizes.h
-OBJCOPYFLAGS_vmlinux.bin.full := -R .comment -S
-$(obj)/vmlinux.bin.full: vmlinux
+OBJCOPYFLAGS_vmlinux.bin := -R .comment -S
+$(obj)/vmlinux.bin: vmlinux
$(call if_changed,objcopy)
-$(obj)/vmlinux.bin: $(obj)/vmlinux.bin.full
- $(call if_changed,trim_head)
-
vmlinux.bin.all-y := $(obj)/vmlinux.bin
-suffix-$(CONFIG_KERNEL_GZIP) := gz
-suffix-$(CONFIG_KERNEL_BZIP2) := bz2
-suffix-$(CONFIG_KERNEL_LZ4) := lz4
-suffix-$(CONFIG_KERNEL_LZMA) := lzma
-suffix-$(CONFIG_KERNEL_LZO) := lzo
-suffix-$(CONFIG_KERNEL_XZ) := xz
+suffix-$(CONFIG_KERNEL_GZIP) := .gz
+suffix-$(CONFIG_KERNEL_BZIP2) := .bz2
+suffix-$(CONFIG_KERNEL_LZ4) := .lz4
+suffix-$(CONFIG_KERNEL_LZMA) := .lzma
+suffix-$(CONFIG_KERNEL_LZO) := .lzo
+suffix-$(CONFIG_KERNEL_XZ) := .xz
$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y)
$(call if_changed,gzip)
@@ -78,5 +65,9 @@
$(call if_changed,xzkern)
LDFLAGS_piggy.o := -r --format binary --oformat $(LD_BFD) -T
-$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.$(suffix-y)
+$(obj)/piggy.o: $(obj)/vmlinux.scr.lds $(obj)/vmlinux.bin$(suffix-y)
$(call if_changed,ld)
+
+chkbss := $(filter-out $(obj)/misc.o $(obj)/piggy.o,$(OBJECTS))
+chkbss-target := $(obj)/vmlinux.bin
+include $(srctree)/arch/s390/scripts/Makefile.chkbss
diff --git a/arch/s390/boot/compressed/head.S b/arch/s390/boot/compressed/head.S
index 9f94eca..df8dbbc 100644
--- a/arch/s390/boot/compressed/head.S
+++ b/arch/s390/boot/compressed/head.S
@@ -15,7 +15,7 @@
#include "sizes.h"
__HEAD
-ENTRY(startup_continue)
+ENTRY(startup_decompressor)
basr %r13,0 # get base
.LPG1:
# setup stack
@@ -23,7 +23,7 @@
aghi %r15,-160
brasl %r14,decompress_kernel
# Set up registers for memory mover. We move the decompressed image to
- # 0x11000, where startup_continue of the decompressed image is supposed
+ # 0x100000, where startup_continue of the decompressed image is supposed
# to be.
lgr %r4,%r2
lg %r2,.Loffset-.LPG1(%r13)
@@ -33,7 +33,7 @@
la %r1,0x200
mvc 0(mover_end-mover,%r1),mover-.LPG1(%r13)
# When the memory mover is done we pass control to
- # arch/s390/kernel/head64.S:startup_continue which lives at 0x11000 in
+ # arch/s390/kernel/head64.S:startup_continue which lives at 0x100000 in
# the decompressed image.
lgr %r6,%r2
br %r1
@@ -47,6 +47,6 @@
.Lstack:
.quad 0x8000 + (1<<(PAGE_SHIFT+THREAD_SIZE_ORDER))
.Loffset:
- .quad 0x11000
+ .quad 0x100000
.Lmvsize:
.quad SZ__bss_start
diff --git a/arch/s390/boot/compressed/misc.c b/arch/s390/boot/compressed/misc.c
index 511b2cc..f66ad73 100644
--- a/arch/s390/boot/compressed/misc.c
+++ b/arch/s390/boot/compressed/misc.c
@@ -71,43 +71,6 @@
return 0;
}
-void *memset(void *s, int c, size_t n)
-{
- char *xs;
-
- xs = s;
- while (n--)
- *xs++ = c;
- return s;
-}
-
-void *memcpy(void *dest, const void *src, size_t n)
-{
- const char *s = src;
- char *d = dest;
-
- while (n--)
- *d++ = *s++;
- return dest;
-}
-
-void *memmove(void *dest, const void *src, size_t n)
-{
- const char *s = src;
- char *d = dest;
-
- if (d <= s) {
- while (n--)
- *d++ = *s++;
- } else {
- d += n;
- s += n;
- while (n--)
- *--d = *--s;
- }
- return dest;
-}
-
static void error(char *x)
{
unsigned long long psw = 0x000a0000deadbeefULL;
diff --git a/arch/s390/boot/compressed/vmlinux.lds.S b/arch/s390/boot/compressed/vmlinux.lds.S
index d43c2db..b16ac8b 100644
--- a/arch/s390/boot/compressed/vmlinux.lds.S
+++ b/arch/s390/boot/compressed/vmlinux.lds.S
@@ -23,13 +23,10 @@
*(.text.*)
_etext = . ;
}
- .rodata.compressed : {
- *(.rodata.compressed)
- }
.rodata : {
_rodata = . ;
*(.rodata) /* read-only data */
- *(.rodata.*)
+ *(EXCLUDE_FILE (*piggy.o) .rodata.compressed)
_erodata = . ;
}
.data : {
@@ -38,6 +35,15 @@
*(.data.*)
_edata = . ;
}
+ startup_continue = 0x100000;
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+ . = 0x100000;
+#else
+ . = ALIGN(8);
+#endif
+ .rodata.compressed : {
+ *(.rodata.compressed)
+ }
. = ALIGN(256);
.bss : {
_bss = . ;
@@ -54,5 +60,6 @@
*(.eh_frame)
*(__ex_table)
*(*__ksymtab*)
+ *(___kcrctab*)
}
}
diff --git a/arch/s390/boot/compressed/vmlinux.scr b/arch/s390/boot/compressed/vmlinux.scr.lds.S
similarity index 70%
rename from arch/s390/boot/compressed/vmlinux.scr
rename to arch/s390/boot/compressed/vmlinux.scr.lds.S
index 42a2425..ff01d18 100644
--- a/arch/s390/boot/compressed/vmlinux.scr
+++ b/arch/s390/boot/compressed/vmlinux.scr.lds.S
@@ -2,10 +2,14 @@
SECTIONS
{
.rodata.compressed : {
+#ifndef CONFIG_KERNEL_UNCOMPRESSED
input_len = .;
LONG(input_data_end - input_data) input_data = .;
+#endif
*(.data)
+#ifndef CONFIG_KERNEL_UNCOMPRESSED
output_len = . - 4;
input_data_end = .;
+#endif
}
}
diff --git a/arch/s390/boot/ebcdic.c b/arch/s390/boot/ebcdic.c
new file mode 100644
index 0000000..7391e7d
--- /dev/null
+++ b/arch/s390/boot/ebcdic.c
@@ -0,0 +1,2 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "../kernel/ebcdic.c"
diff --git a/arch/s390/kernel/head.S b/arch/s390/boot/head.S
similarity index 97%
rename from arch/s390/kernel/head.S
rename to arch/s390/boot/head.S
index 5c42f16..f721913b 100644
--- a/arch/s390/kernel/head.S
+++ b/arch/s390/boot/head.S
@@ -272,14 +272,14 @@
.org 0x10000
ENTRY(startup)
j .Lep_startup_normal
- .org 0x10008
+ .org EP_OFFSET
#
# This is a list of s390 kernel entry points. At address 0x1000f the number of
# valid entry points is stored.
#
# IMPORTANT: Do not change this table, it is s390 kernel ABI!
#
- .ascii "S390EP"
+ .ascii EP_STRING
.byte 0x00,0x01
#
# kdump startup-code at 0x10010, running in 64 bit absolute addressing mode
@@ -310,10 +310,11 @@
l %r15,.Lstack-.LPG0(%r13)
ahi %r15,-STACK_FRAME_OVERHEAD
brasl %r14,verify_facilities
-# For uncompressed images, continue in
-# arch/s390/kernel/head64.S. For compressed images, continue in
-# arch/s390/boot/compressed/head.S.
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
jg startup_continue
+#else
+ jg startup_decompressor
+#endif
.Lstack:
.long 0x8000 + (1<<(PAGE_SHIFT+THREAD_SIZE_ORDER))
diff --git a/arch/s390/kernel/head_kdump.S b/arch/s390/boot/head_kdump.S
similarity index 100%
rename from arch/s390/kernel/head_kdump.S
rename to arch/s390/boot/head_kdump.S
diff --git a/arch/s390/boot/mem.S b/arch/s390/boot/mem.S
new file mode 100644
index 0000000..b334636
--- /dev/null
+++ b/arch/s390/boot/mem.S
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include "../lib/mem.S"
diff --git a/arch/s390/boot/sclp_early_core.c b/arch/s390/boot/sclp_early_core.c
new file mode 100644
index 0000000..5a19fd7
--- /dev/null
+++ b/arch/s390/boot/sclp_early_core.c
@@ -0,0 +1,2 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "../../../drivers/s390/char/sclp_early_core.c"
diff --git a/arch/s390/hypfs/hypfs_diag.c b/arch/s390/hypfs/hypfs_diag.c
index a2945b2..3452e18 100644
--- a/arch/s390/hypfs/hypfs_diag.c
+++ b/arch/s390/hypfs/hypfs_diag.c
@@ -497,7 +497,7 @@
}
diag224_idx2name(cpu_info__ctidx(diag204_info_type, cpu_info), buffer);
rc = hypfs_create_str(cpu_dir, "type", buffer);
- return PTR_RET(rc);
+ return PTR_ERR_OR_ZERO(rc);
}
static void *hypfs_create_lpar_files(struct dentry *systems_dir, void *part_hdr)
@@ -544,7 +544,7 @@
return PTR_ERR(rc);
diag224_idx2name(phys_cpu__ctidx(diag204_info_type, cpu_info), buffer);
rc = hypfs_create_str(cpu_dir, "type", buffer);
- return PTR_RET(rc);
+ return PTR_ERR_OR_ZERO(rc);
}
static void *hypfs_create_phys_files(struct dentry *parent_dir, void *phys_hdr)
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 06b513d..c681329 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -36,7 +36,7 @@
kuid_t uid; /* uid used for files and dirs */
kgid_t gid; /* gid used for files and dirs */
struct dentry *update_file; /* file to trigger update */
- time_t last_update; /* last update time in secs since 1970 */
+ time64_t last_update; /* last update, CLOCK_MONOTONIC time */
struct mutex lock; /* lock to protect update process */
};
@@ -52,7 +52,7 @@
struct hypfs_sb_info *sb_info = sb->s_fs_info;
struct inode *inode = d_inode(sb_info->update_file);
- sb_info->last_update = get_seconds();
+ sb_info->last_update = ktime_get_seconds();
inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
}
@@ -179,7 +179,7 @@
* to restart data collection in this case.
*/
mutex_lock(&fs_info->lock);
- if (fs_info->last_update == get_seconds()) {
+ if (fs_info->last_update == ktime_get_seconds()) {
rc = -EBUSY;
goto out;
}
diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
index c1bedb4..046e044 100644
--- a/arch/s390/include/asm/ap.h
+++ b/arch/s390/include/asm/ap.h
@@ -47,6 +47,50 @@
};
/**
+ * ap_intructions_available() - Test if AP instructions are available.
+ *
+ * Returns 0 if the AP instructions are installed.
+ */
+static inline int ap_instructions_available(void)
+{
+ register unsigned long reg0 asm ("0") = AP_MKQID(0, 0);
+ register unsigned long reg1 asm ("1") = -ENODEV;
+ register unsigned long reg2 asm ("2");
+
+ asm volatile(
+ " .long 0xb2af0000\n" /* PQAP(TAPQ) */
+ "0: la %0,0\n"
+ "1:\n"
+ EX_TABLE(0b, 1b)
+ : "+d" (reg1), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
+ return reg1;
+}
+
+/**
+ * ap_tapq(): Test adjunct processor queue.
+ * @qid: The AP queue number
+ * @info: Pointer to queue descriptor
+ *
+ * Returns AP queue status structure.
+ */
+static inline struct ap_queue_status ap_tapq(ap_qid_t qid, unsigned long *info)
+{
+ register unsigned long reg0 asm ("0") = qid;
+ register struct ap_queue_status reg1 asm ("1");
+ register unsigned long reg2 asm ("2");
+
+ asm volatile(".long 0xb2af0000" /* PQAP(TAPQ) */
+ : "=d" (reg1), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
+ if (info)
+ *info = reg2;
+ return reg1;
+}
+
+/**
* ap_test_queue(): Test adjunct processor queue.
* @qid: The AP queue number
* @tbit: Test facilities bit
@@ -54,10 +98,57 @@
*
* Returns AP queue status structure.
*/
-struct ap_queue_status ap_test_queue(ap_qid_t qid,
- int tbit,
- unsigned long *info);
+static inline struct ap_queue_status ap_test_queue(ap_qid_t qid,
+ int tbit,
+ unsigned long *info)
+{
+ if (tbit)
+ qid |= 1UL << 23; /* set T bit*/
+ return ap_tapq(qid, info);
+}
+/**
+ * ap_pqap_rapq(): Reset adjunct processor queue.
+ * @qid: The AP queue number
+ *
+ * Returns AP queue status structure.
+ */
+static inline struct ap_queue_status ap_rapq(ap_qid_t qid)
+{
+ register unsigned long reg0 asm ("0") = qid | (1UL << 24);
+ register struct ap_queue_status reg1 asm ("1");
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(RAPQ) */
+ : "=d" (reg1)
+ : "d" (reg0)
+ : "cc");
+ return reg1;
+}
+
+/**
+ * ap_pqap_zapq(): Reset and zeroize adjunct processor queue.
+ * @qid: The AP queue number
+ *
+ * Returns AP queue status structure.
+ */
+static inline struct ap_queue_status ap_zapq(ap_qid_t qid)
+{
+ register unsigned long reg0 asm ("0") = qid | (2UL << 24);
+ register struct ap_queue_status reg1 asm ("1");
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(ZAPQ) */
+ : "=d" (reg1)
+ : "d" (reg0)
+ : "cc");
+ return reg1;
+}
+
+/**
+ * struct ap_config_info - convenience struct for AP crypto
+ * config info as returned by the ap_qci() function.
+ */
struct ap_config_info {
unsigned int apsc : 1; /* S bit */
unsigned int apxa : 1; /* N bit */
@@ -74,50 +165,189 @@
unsigned char _reserved4[16];
} __aligned(8);
-/*
- * ap_query_configuration(): Fetch cryptographic config info
+/**
+ * ap_qci(): Get AP configuration data
*
- * Returns the ap configuration info fetched via PQAP(QCI).
- * On success 0 is returned, on failure a negative errno
- * is returned, e.g. if the PQAP(QCI) instruction is not
- * available, the return value will be -EOPNOTSUPP.
+ * Returns 0 on success, or -EOPNOTSUPP.
*/
-int ap_query_configuration(struct ap_config_info *info);
+static inline int ap_qci(struct ap_config_info *config)
+{
+ register unsigned long reg0 asm ("0") = 4UL << 24;
+ register unsigned long reg1 asm ("1") = -EOPNOTSUPP;
+ register struct ap_config_info *reg2 asm ("2") = config;
+
+ asm volatile(
+ ".long 0xb2af0000\n" /* PQAP(QCI) */
+ "0: la %0,0\n"
+ "1:\n"
+ EX_TABLE(0b, 1b)
+ : "+d" (reg1)
+ : "d" (reg0), "d" (reg2)
+ : "cc", "memory");
+
+ return reg1;
+}
/*
* struct ap_qirq_ctrl - convenient struct for easy invocation
- * of the ap_queue_irq_ctrl() function. This struct is passed
- * as GR1 parameter to the PQAP(AQIC) instruction. For details
- * please see the AR documentation.
+ * of the ap_aqic() function. This struct is passed as GR1
+ * parameter to the PQAP(AQIC) instruction. For details please
+ * see the AR documentation.
*/
struct ap_qirq_ctrl {
unsigned int _res1 : 8;
- unsigned int zone : 8; /* zone info */
- unsigned int ir : 1; /* ir flag: enable (1) or disable (0) irq */
+ unsigned int zone : 8; /* zone info */
+ unsigned int ir : 1; /* ir flag: enable (1) or disable (0) irq */
unsigned int _res2 : 4;
- unsigned int gisc : 3; /* guest isc field */
+ unsigned int gisc : 3; /* guest isc field */
unsigned int _res3 : 6;
- unsigned int gf : 2; /* gisa format */
+ unsigned int gf : 2; /* gisa format */
unsigned int _res4 : 1;
- unsigned int gisa : 27; /* gisa origin */
+ unsigned int gisa : 27; /* gisa origin */
unsigned int _res5 : 1;
- unsigned int isc : 3; /* irq sub class */
+ unsigned int isc : 3; /* irq sub class */
};
/**
- * ap_queue_irq_ctrl(): Control interruption on a AP queue.
+ * ap_aqic(): Control interruption for a specific AP.
* @qid: The AP queue number
- * @qirqctrl: struct ap_qirq_ctrl, see above
+ * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
* @ind: The notification indicator byte
*
* Returns AP queue status.
- *
- * Control interruption on the given AP queue.
- * Just a simple wrapper function for the low level PQAP(AQIC)
- * instruction available for other kernel modules.
*/
-struct ap_queue_status ap_queue_irq_ctrl(ap_qid_t qid,
- struct ap_qirq_ctrl qirqctrl,
- void *ind);
+static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
+ struct ap_qirq_ctrl qirqctrl,
+ void *ind)
+{
+ register unsigned long reg0 asm ("0") = qid | (3UL << 24);
+ register struct ap_qirq_ctrl reg1_in asm ("1") = qirqctrl;
+ register struct ap_queue_status reg1_out asm ("1");
+ register void *reg2 asm ("2") = ind;
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(AQIC) */
+ : "=d" (reg1_out)
+ : "d" (reg0), "d" (reg1_in), "d" (reg2)
+ : "cc");
+ return reg1_out;
+}
+
+/*
+ * union ap_qact_ap_info - used together with the
+ * ap_aqic() function to provide a convenient way
+ * to handle the ap info needed by the qact function.
+ */
+union ap_qact_ap_info {
+ unsigned long val;
+ struct {
+ unsigned int : 3;
+ unsigned int mode : 3;
+ unsigned int : 26;
+ unsigned int cat : 8;
+ unsigned int : 8;
+ unsigned char ver[2];
+ };
+};
+
+/**
+ * ap_qact(): Query AP combatibility type.
+ * @qid: The AP queue number
+ * @apinfo: On input the info about the AP queue. On output the
+ * alternate AP queue info provided by the qact function
+ * in GR2 is stored in.
+ *
+ * Returns AP queue status. Check response_code field for failures.
+ */
+static inline struct ap_queue_status ap_qact(ap_qid_t qid, int ifbit,
+ union ap_qact_ap_info *apinfo)
+{
+ register unsigned long reg0 asm ("0") = qid | (5UL << 24)
+ | ((ifbit & 0x01) << 22);
+ register unsigned long reg1_in asm ("1") = apinfo->val;
+ register struct ap_queue_status reg1_out asm ("1");
+ register unsigned long reg2 asm ("2");
+
+ asm volatile(
+ ".long 0xb2af0000" /* PQAP(QACT) */
+ : "+d" (reg1_in), "=d" (reg1_out), "=d" (reg2)
+ : "d" (reg0)
+ : "cc");
+ apinfo->val = reg2;
+ return reg1_out;
+}
+
+/**
+ * ap_nqap(): Send message to adjunct processor queue.
+ * @qid: The AP queue number
+ * @psmid: The program supplied message identifier
+ * @msg: The message text
+ * @length: The message length
+ *
+ * Returns AP queue status structure.
+ * Condition code 1 on NQAP can't happen because the L bit is 1.
+ * Condition code 2 on NQAP also means the send is incomplete,
+ * because a segment boundary was reached. The NQAP is repeated.
+ */
+static inline struct ap_queue_status ap_nqap(ap_qid_t qid,
+ unsigned long long psmid,
+ void *msg, size_t length)
+{
+ register unsigned long reg0 asm ("0") = qid | 0x40000000UL;
+ register struct ap_queue_status reg1 asm ("1");
+ register unsigned long reg2 asm ("2") = (unsigned long) msg;
+ register unsigned long reg3 asm ("3") = (unsigned long) length;
+ register unsigned long reg4 asm ("4") = (unsigned int) (psmid >> 32);
+ register unsigned long reg5 asm ("5") = psmid & 0xffffffff;
+
+ asm volatile (
+ "0: .long 0xb2ad0042\n" /* NQAP */
+ " brc 2,0b"
+ : "+d" (reg0), "=d" (reg1), "+d" (reg2), "+d" (reg3)
+ : "d" (reg4), "d" (reg5)
+ : "cc", "memory");
+ return reg1;
+}
+
+/**
+ * ap_dqap(): Receive message from adjunct processor queue.
+ * @qid: The AP queue number
+ * @psmid: Pointer to program supplied message identifier
+ * @msg: The message text
+ * @length: The message length
+ *
+ * Returns AP queue status structure.
+ * Condition code 1 on DQAP means the receive has taken place
+ * but only partially. The response is incomplete, hence the
+ * DQAP is repeated.
+ * Condition code 2 on DQAP also means the receive is incomplete,
+ * this time because a segment boundary was reached. Again, the
+ * DQAP is repeated.
+ * Note that gpr2 is used by the DQAP instruction to keep track of
+ * any 'residual' length, in case the instruction gets interrupted.
+ * Hence it gets zeroed before the instruction.
+ */
+static inline struct ap_queue_status ap_dqap(ap_qid_t qid,
+ unsigned long long *psmid,
+ void *msg, size_t length)
+{
+ register unsigned long reg0 asm("0") = qid | 0x80000000UL;
+ register struct ap_queue_status reg1 asm ("1");
+ register unsigned long reg2 asm("2") = 0UL;
+ register unsigned long reg4 asm("4") = (unsigned long) msg;
+ register unsigned long reg5 asm("5") = (unsigned long) length;
+ register unsigned long reg6 asm("6") = 0UL;
+ register unsigned long reg7 asm("7") = 0UL;
+
+
+ asm volatile(
+ "0: .long 0xb2ae0064\n" /* DQAP */
+ " brc 6,0b\n"
+ : "+d" (reg0), "=d" (reg1), "+d" (reg2),
+ "+d" (reg4), "+d" (reg5), "+d" (reg6), "+d" (reg7)
+ : : "cc", "memory");
+ *psmid = (((unsigned long long) reg6) << 32) + reg7;
+ return reg1;
+}
#endif /* _ASM_S390_AP_H_ */
diff --git a/arch/s390/include/asm/atomic.h b/arch/s390/include/asm/atomic.h
index 4b55532..fd20ab5 100644
--- a/arch/s390/include/asm/atomic.h
+++ b/arch/s390/include/asm/atomic.h
@@ -55,17 +55,9 @@
__atomic_add(i, &v->counter);
}
-#define atomic_add_negative(_i, _v) (atomic_add_return(_i, _v) < 0)
-#define atomic_inc(_v) atomic_add(1, _v)
-#define atomic_inc_return(_v) atomic_add_return(1, _v)
-#define atomic_inc_and_test(_v) (atomic_add_return(1, _v) == 0)
#define atomic_sub(_i, _v) atomic_add(-(int)(_i), _v)
#define atomic_sub_return(_i, _v) atomic_add_return(-(int)(_i), _v)
#define atomic_fetch_sub(_i, _v) atomic_fetch_add(-(int)(_i), _v)
-#define atomic_sub_and_test(_i, _v) (atomic_sub_return(_i, _v) == 0)
-#define atomic_dec(_v) atomic_sub(1, _v)
-#define atomic_dec_return(_v) atomic_sub_return(1, _v)
-#define atomic_dec_and_test(_v) (atomic_sub_return(1, _v) == 0)
#define ATOMIC_OPS(op) \
static inline void atomic_##op(int i, atomic_t *v) \
@@ -90,21 +82,6 @@
return __atomic_cmpxchg(&v->counter, old, new);
}
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == u))
- break;
- old = atomic_cmpxchg(v, c, c + a);
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
-
#define ATOMIC64_INIT(i) { (i) }
static inline long atomic64_read(const atomic64_t *v)
@@ -168,50 +145,8 @@
#undef ATOMIC64_OPS
-static inline int atomic64_add_unless(atomic64_t *v, long i, long u)
-{
- long c, old;
-
- c = atomic64_read(v);
- for (;;) {
- if (unlikely(c == u))
- break;
- old = atomic64_cmpxchg(v, c, c + i);
- if (likely(old == c))
- break;
- c = old;
- }
- return c != u;
-}
-
-static inline long atomic64_dec_if_positive(atomic64_t *v)
-{
- long c, old, dec;
-
- c = atomic64_read(v);
- for (;;) {
- dec = c - 1;
- if (unlikely(dec < 0))
- break;
- old = atomic64_cmpxchg((v), c, dec);
- if (likely(old == c))
- break;
- c = old;
- }
- return dec;
-}
-
-#define atomic64_add_negative(_i, _v) (atomic64_add_return(_i, _v) < 0)
-#define atomic64_inc(_v) atomic64_add(1, _v)
-#define atomic64_inc_return(_v) atomic64_add_return(1, _v)
-#define atomic64_inc_and_test(_v) (atomic64_add_return(1, _v) == 0)
#define atomic64_sub_return(_i, _v) atomic64_add_return(-(long)(_i), _v)
#define atomic64_fetch_sub(_i, _v) atomic64_fetch_add(-(long)(_i), _v)
#define atomic64_sub(_i, _v) atomic64_add(-(long)(_i), _v)
-#define atomic64_sub_and_test(_i, _v) (atomic64_sub_return(_i, _v) == 0)
-#define atomic64_dec(_v) atomic64_sub(1, _v)
-#define atomic64_dec_return(_v) atomic64_sub_return(1, _v)
-#define atomic64_dec_and_test(_v) (atomic64_sub_return(1, _v) == 0)
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
#endif /* __ARCH_S390_ATOMIC__ */
diff --git a/arch/s390/include/asm/cpu_mf.h b/arch/s390/include/asm/cpu_mf.h
index de023a9..bf2cbff 100644
--- a/arch/s390/include/asm/cpu_mf.h
+++ b/arch/s390/include/asm/cpu_mf.h
@@ -2,7 +2,7 @@
/*
* CPU-measurement facilities
*
- * Copyright IBM Corp. 2012
+ * Copyright IBM Corp. 2012, 2018
* Author(s): Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
* Jan Glauber <jang@linux.vnet.ibm.com>
*/
@@ -139,8 +139,14 @@
unsigned char timestamp[16]; /* 16 - 31 timestamp */
unsigned long long reserved1; /* 32 -Reserved */
unsigned long long reserved2; /* */
- unsigned long long progusage1; /* 48 - reserved for programming use */
- unsigned long long progusage2; /* */
+ union { /* 48 - reserved for programming use */
+ struct {
+ unsigned int clock_base:1; /* in progusage2 */
+ unsigned long long progusage1:63;
+ unsigned long long progusage2;
+ };
+ unsigned long long progusage[2];
+ };
} __packed;
/* Load program parameter */
diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index e07cce8..fcbd638 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -9,6 +9,14 @@
#ifndef _ASM_S390_GMAP_H
#define _ASM_S390_GMAP_H
+/* Generic bits for GMAP notification on DAT table entry changes. */
+#define GMAP_NOTIFY_SHADOW 0x2
+#define GMAP_NOTIFY_MPROT 0x1
+
+/* Status bits only for huge segment entries */
+#define _SEGMENT_ENTRY_GMAP_IN 0x8000 /* invalidation notify bit */
+#define _SEGMENT_ENTRY_GMAP_UC 0x4000 /* dirty (migration) */
+
/**
* struct gmap_struct - guest address space
* @list: list head for the mm->context gmap list
@@ -132,4 +140,6 @@
int gmap_mprotect_notify(struct gmap *, unsigned long start,
unsigned long len, int prot);
+void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
+ unsigned long gaddr, unsigned long vmaddr);
#endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
index 9c5fc50..2d1afa5 100644
--- a/arch/s390/include/asm/hugetlb.h
+++ b/arch/s390/include/asm/hugetlb.h
@@ -37,7 +37,10 @@
return 0;
}
-#define arch_clear_hugepage_flags(page) do { } while (0)
+static inline void arch_clear_hugepage_flags(struct page *page)
+{
+ clear_bit(PG_arch_1, &page->flags);
+}
static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz)
diff --git a/arch/s390/include/asm/kprobes.h b/arch/s390/include/asm/kprobes.h
index 13de80c..b106aa2 100644
--- a/arch/s390/include/asm/kprobes.h
+++ b/arch/s390/include/asm/kprobes.h
@@ -68,8 +68,6 @@
unsigned long kprobe_saved_imask;
unsigned long kprobe_saved_ctl[3];
struct prev_kprobe prev_kprobe;
- struct pt_regs jprobe_saved_regs;
- kprobe_opcode_t jprobes_stack[MAX_STACK_SIZE];
};
void arch_remove_kprobe(struct kprobe *p);
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 5bc8888..406d940 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -185,7 +185,7 @@
/* Transaction abort diagnostic block */
__u8 pgm_tdb[256]; /* 0x1800 */
__u8 pad_0x1900[0x2000-0x1900]; /* 0x1900 */
-} __packed;
+} __packed __aligned(8192);
#define S390_lowcore (*((struct lowcore *) 0))
diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
index f5ff9db..f31a150 100644
--- a/arch/s390/include/asm/mmu.h
+++ b/arch/s390/include/asm/mmu.h
@@ -24,6 +24,8 @@
unsigned int uses_skeys:1;
/* The mmu context uses CMM. */
unsigned int uses_cmm:1;
+ /* The gmaps associated with this context are allowed to use huge pages. */
+ unsigned int allow_gmap_hpage_1m:1;
} mm_context_t;
#define INIT_MM_CONTEXT(name) \
diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index d16bc79..0717ee7 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -32,6 +32,7 @@
mm->context.has_pgste = 0;
mm->context.uses_skeys = 0;
mm->context.uses_cmm = 0;
+ mm->context.allow_gmap_hpage_1m = 0;
#endif
switch (mm->context.asce_limit) {
case _REGION2_SIZE:
diff --git a/arch/s390/include/asm/nospec-insn.h b/arch/s390/include/asm/nospec-insn.h
index a01f811..123dac3 100644
--- a/arch/s390/include/asm/nospec-insn.h
+++ b/arch/s390/include/asm/nospec-insn.h
@@ -8,7 +8,7 @@
#ifdef __ASSEMBLY__
-#ifdef CONFIG_EXPOLINE
+#ifdef CC_USING_EXPOLINE
_LC_BR_R1 = __LC_BR_R1
@@ -189,7 +189,7 @@
.macro BASR_EX rsave,rtarget,ruse=%r1
basr \rsave,\rtarget
.endm
-#endif
+#endif /* CC_USING_EXPOLINE */
#endif /* __ASSEMBLY__ */
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 94f8db4..10fe982 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -51,6 +51,10 @@
u64 max_work_units;
};
+struct zpci_fmb_fmt3 {
+ u64 tx_bytes;
+};
+
struct zpci_fmb {
u32 format : 8;
u32 fmt_ind : 24;
@@ -66,6 +70,7 @@
struct zpci_fmb_fmt0 fmt0;
struct zpci_fmb_fmt1 fmt1;
struct zpci_fmb_fmt2 fmt2;
+ struct zpci_fmb_fmt3 fmt3;
};
} __packed __aligned(128);
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5ab6360..0e7cb0d 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -268,8 +268,10 @@
#define _REGION_ENTRY_BITS_LARGE 0xffffffff8000fe2fUL
/* Bits in the segment table entry */
-#define _SEGMENT_ENTRY_BITS 0xfffffffffffffe33UL
-#define _SEGMENT_ENTRY_BITS_LARGE 0xfffffffffff0ff33UL
+#define _SEGMENT_ENTRY_BITS 0xfffffffffffffe33UL
+#define _SEGMENT_ENTRY_BITS_LARGE 0xfffffffffff0ff33UL
+#define _SEGMENT_ENTRY_HARDWARE_BITS 0xfffffffffffffe30UL
+#define _SEGMENT_ENTRY_HARDWARE_BITS_LARGE 0xfffffffffff00730UL
#define _SEGMENT_ENTRY_ORIGIN_LARGE ~0xfffffUL /* large page address */
#define _SEGMENT_ENTRY_ORIGIN ~0x7ffUL/* page table origin */
#define _SEGMENT_ENTRY_PROTECT 0x200 /* segment protection bit */
@@ -1101,7 +1103,8 @@
pte_t *sptep, pte_t *tptep, pte_t pte);
void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *ptep);
-bool test_and_clear_guest_dirty(struct mm_struct *mm, unsigned long address);
+bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long address,
+ pte_t *ptep);
int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
unsigned char key, bool nq);
int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
@@ -1116,6 +1119,10 @@
int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgstep);
int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc,
unsigned long *oldpte, unsigned long *oldpgste);
+void gmap_pmdp_csp(struct mm_struct *mm, unsigned long vmaddr);
+void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr);
+void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr);
+void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr);
/*
* Certain architectures need to do special things when PTEs
diff --git a/arch/s390/include/asm/purgatory.h b/arch/s390/include/asm/purgatory.h
index 6090670..e297bcf 100644
--- a/arch/s390/include/asm/purgatory.h
+++ b/arch/s390/include/asm/purgatory.h
@@ -13,11 +13,5 @@
int verify_sha256_digest(void);
-extern u64 kernel_entry;
-extern u64 kernel_type;
-
-extern u64 crash_start;
-extern u64 crash_size;
-
#endif /* __ASSEMBLY__ */
#endif /* _S390_PURGATORY_H_ */
diff --git a/arch/s390/include/asm/qdio.h b/arch/s390/include/asm/qdio.h
index de11ecc..9c9970a 100644
--- a/arch/s390/include/asm/qdio.h
+++ b/arch/s390/include/asm/qdio.h
@@ -262,7 +262,6 @@
void *user;
};
-#define QDIO_OUTBUF_STATE_FLAG_NONE 0x00
#define QDIO_OUTBUF_STATE_FLAG_PENDING 0x01
#define CHSC_AC1_INITIATE_INPUTQ 0x80
diff --git a/arch/s390/include/asm/sections.h b/arch/s390/include/asm/sections.h
index 54f81f8..724faed 100644
--- a/arch/s390/include/asm/sections.h
+++ b/arch/s390/include/asm/sections.h
@@ -4,6 +4,4 @@
#include <asm-generic/sections.h>
-extern char _ehead[];
-
#endif
diff --git a/arch/s390/include/asm/setup.h b/arch/s390/include/asm/setup.h
index 9c30ebe..1d66016 100644
--- a/arch/s390/include/asm/setup.h
+++ b/arch/s390/include/asm/setup.h
@@ -9,8 +9,10 @@
#include <linux/const.h>
#include <uapi/asm/setup.h>
-
+#define EP_OFFSET 0x10008
+#define EP_STRING "S390EP"
#define PARMAREA 0x10400
+#define PARMAREA_END 0x11000
/*
* Machine features detected in early.c
diff --git a/arch/s390/include/uapi/asm/chsc.h b/arch/s390/include/uapi/asm/chsc.h
index dc329aa..83a574e 100644
--- a/arch/s390/include/uapi/asm/chsc.h
+++ b/arch/s390/include/uapi/asm/chsc.h
@@ -23,29 +23,29 @@
__u32 key : 4;
__u32 : 28;
struct subchannel_id sid;
-} __attribute__ ((packed));
+};
struct chsc_async_area {
struct chsc_async_header header;
__u8 data[CHSC_SIZE - sizeof(struct chsc_async_header)];
-} __attribute__ ((packed));
+};
struct chsc_header {
__u16 length;
__u16 code;
-} __attribute__ ((packed));
+};
struct chsc_sync_area {
struct chsc_header header;
__u8 data[CHSC_SIZE - sizeof(struct chsc_header)];
-} __attribute__ ((packed));
+};
struct chsc_response_struct {
__u16 length;
__u16 code;
__u32 parms;
__u8 data[CHSC_SIZE - 2 * sizeof(__u16) - sizeof(__u32)];
-} __attribute__ ((packed));
+};
struct chsc_chp_cd {
struct chp_id chpid;
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index 2fed39b..dbfd173 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -9,39 +9,21 @@
CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
# Do not trace early setup code
-CFLAGS_REMOVE_als.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_early_nobss.o = $(CC_FLAGS_FTRACE)
endif
-GCOV_PROFILE_als.o := n
GCOV_PROFILE_early.o := n
GCOV_PROFILE_early_nobss.o := n
-KCOV_INSTRUMENT_als.o := n
KCOV_INSTRUMENT_early.o := n
KCOV_INSTRUMENT_early_nobss.o := n
-UBSAN_SANITIZE_als.o := n
UBSAN_SANITIZE_early.o := n
UBSAN_SANITIZE_early_nobss.o := n
#
-# Use -march=z900 for als.c to be able to print an error
-# message if the kernel is started on a machine which is too old
-#
-ifneq ($(CC_FLAGS_MARCH),-march=z900)
-CFLAGS_REMOVE_als.o += $(CC_FLAGS_MARCH)
-CFLAGS_REMOVE_als.o += $(CC_FLAGS_EXPOLINE)
-CFLAGS_als.o += -march=z900
-AFLAGS_REMOVE_head.o += $(CC_FLAGS_MARCH)
-AFLAGS_head.o += -march=z900
-endif
-
-CFLAGS_als.o += -D__NO_FORTIFY
-
-#
# Passing null pointers is ok for smp code, since we access the lowcore here.
#
CFLAGS_smp.o := -Wno-nonnull
@@ -61,13 +43,13 @@
obj-y := traps.o time.o process.o base.o early.o setup.o idle.o vtime.o
obj-y += processor.o sys_s390.o ptrace.o signal.o cpcmd.o ebcdic.o nmi.o
-obj-y += debug.o irq.o ipl.o dis.o diag.o vdso.o als.o early_nobss.o
+obj-y += debug.o irq.o ipl.o dis.o diag.o vdso.o early_nobss.o
obj-y += sysinfo.o jump_label.o lgr.o os_info.o machine_kexec.o pgm_check.o
obj-y += runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o sthyi.o
obj-y += entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
obj-y += nospec-branch.o
-extra-y += head.o head64.o vmlinux.lds
+extra-y += head64.o vmlinux.lds
obj-$(CONFIG_SYSFS) += nospec-sysfs.o
CFLAGS_REMOVE_nospec-branch.o += $(CC_FLAGS_EXPOLINE)
@@ -99,5 +81,5 @@
obj-y += vdso64/
obj-$(CONFIG_COMPAT) += vdso32/
-chkbss := head.o head64.o als.o early_nobss.o
+chkbss := head64.o early_nobss.o
include $(srctree)/arch/s390/scripts/Makefile.chkbss
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 9f5ea9d..c3620ba 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -306,6 +306,15 @@
return rc;
}
+static const char *nt_name(Elf64_Word type)
+{
+ const char *name = "LINUX";
+
+ if (type == NT_PRPSINFO || type == NT_PRSTATUS || type == NT_PRFPREG)
+ name = KEXEC_CORE_NOTE_NAME;
+ return name;
+}
+
/*
* Initialize ELF note
*/
@@ -332,11 +341,26 @@
static inline void *nt_init(void *buf, Elf64_Word type, void *desc, int d_len)
{
- const char *note_name = "LINUX";
+ return nt_init_name(buf, type, desc, d_len, nt_name(type));
+}
- if (type == NT_PRPSINFO || type == NT_PRSTATUS || type == NT_PRFPREG)
- note_name = KEXEC_CORE_NOTE_NAME;
- return nt_init_name(buf, type, desc, d_len, note_name);
+/*
+ * Calculate the size of ELF note
+ */
+static size_t nt_size_name(int d_len, const char *name)
+{
+ size_t size;
+
+ size = sizeof(Elf64_Nhdr);
+ size += roundup(strlen(name) + 1, 4);
+ size += roundup(d_len, 4);
+
+ return size;
+}
+
+static inline size_t nt_size(Elf64_Word type, int d_len)
+{
+ return nt_size_name(d_len, nt_name(type));
}
/*
@@ -375,6 +399,29 @@
}
/*
+ * Calculate size of ELF notes per cpu
+ */
+static size_t get_cpu_elf_notes_size(void)
+{
+ struct save_area *sa = NULL;
+ size_t size;
+
+ size = nt_size(NT_PRSTATUS, sizeof(struct elf_prstatus));
+ size += nt_size(NT_PRFPREG, sizeof(elf_fpregset_t));
+ size += nt_size(NT_S390_TIMER, sizeof(sa->timer));
+ size += nt_size(NT_S390_TODCMP, sizeof(sa->todcmp));
+ size += nt_size(NT_S390_TODPREG, sizeof(sa->todpreg));
+ size += nt_size(NT_S390_CTRS, sizeof(sa->ctrs));
+ size += nt_size(NT_S390_PREFIX, sizeof(sa->prefix));
+ if (MACHINE_HAS_VX) {
+ size += nt_size(NT_S390_VXRS_HIGH, sizeof(sa->vxrs_high));
+ size += nt_size(NT_S390_VXRS_LOW, sizeof(sa->vxrs_low));
+ }
+
+ return size;
+}
+
+/*
* Initialize prpsinfo note (new kernel)
*/
static void *nt_prpsinfo(void *ptr)
@@ -429,6 +476,30 @@
return nt_init_name(ptr, 0, vmcoreinfo, size, "VMCOREINFO");
}
+static size_t nt_vmcoreinfo_size(void)
+{
+ const char *name = "VMCOREINFO";
+ char nt_name[11];
+ Elf64_Nhdr note;
+ void *addr;
+
+ if (copy_oldmem_kernel(&addr, &S390_lowcore.vmcore_info, sizeof(addr)))
+ return 0;
+
+ if (copy_oldmem_kernel(¬e, addr, sizeof(note)))
+ return 0;
+
+ memset(nt_name, 0, sizeof(nt_name));
+ if (copy_oldmem_kernel(nt_name, addr + sizeof(note),
+ sizeof(nt_name) - 1))
+ return 0;
+
+ if (strcmp(nt_name, name) != 0)
+ return 0;
+
+ return nt_size_name(note.n_descsz, name);
+}
+
/*
* Initialize final note (needed for /proc/vmcore code)
*/
@@ -539,6 +610,27 @@
return ptr;
}
+static size_t get_elfcorehdr_size(int mem_chunk_cnt)
+{
+ size_t size;
+
+ size = sizeof(Elf64_Ehdr);
+ /* PT_NOTES */
+ size += sizeof(Elf64_Phdr);
+ /* nt_prpsinfo */
+ size += nt_size(NT_PRPSINFO, sizeof(struct elf_prpsinfo));
+ /* regsets */
+ size += get_cpu_cnt() * get_cpu_elf_notes_size();
+ /* nt_vmcoreinfo */
+ size += nt_vmcoreinfo_size();
+ /* nt_final */
+ size += sizeof(Elf64_Nhdr);
+ /* PT_LOADS */
+ size += mem_chunk_cnt * sizeof(Elf64_Phdr);
+
+ return size;
+}
+
/*
* Create ELF core header (new kernel)
*/
@@ -566,8 +658,8 @@
mem_chunk_cnt = get_mem_chunk_cnt();
- alloc_size = 0x1000 + get_cpu_cnt() * 0x4a0 +
- mem_chunk_cnt * sizeof(Elf64_Phdr);
+ alloc_size = get_elfcorehdr_size(mem_chunk_cnt);
+
hdr = kzalloc_panic(alloc_size);
/* Init elf header */
ptr = ehdr_init(hdr, mem_chunk_cnt);
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 827699e..5b28b43 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -331,8 +331,20 @@
append_to_cmdline(append_ipl_scpdata);
}
+static void __init check_image_bootable(void)
+{
+ if (!memcmp(EP_STRING, (void *)EP_OFFSET, strlen(EP_STRING)))
+ return;
+
+ sclp_early_printk("Linux kernel boot failure: An attempt to boot a vmlinux ELF image failed.\n");
+ sclp_early_printk("This image does not contain all parts necessary for starting up. Use\n");
+ sclp_early_printk("bzImage or arch/s390/boot/compressed/vmlinux instead.\n");
+ disabled_wait(0xbadb007);
+}
+
void __init startup_init(void)
{
+ check_image_bootable();
time_early_init();
init_kernel_storage_key();
lockdep_off();
diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
index 961abfa..472fa2f 100644
--- a/arch/s390/kernel/entry.h
+++ b/arch/s390/kernel/entry.h
@@ -83,7 +83,6 @@
DECLARE_PER_CPU(u64, mt_cycles[8]);
-void verify_facilities(void);
void gs_load_bc_cb(struct pt_regs *regs);
void set_fs_fixup(void);
diff --git a/arch/s390/kernel/head64.S b/arch/s390/kernel/head64.S
index 791cb90..6d14ad4 100644
--- a/arch/s390/kernel/head64.S
+++ b/arch/s390/kernel/head64.S
@@ -48,11 +48,23 @@
# Early machine initialization and detection functions.
#
brasl %r14,startup_init
- lpswe .Lentry-.LPG1(13) # jump to _stext in primary-space,
- # virtual and never return ...
+
+# check control registers
+ stctg %c0,%c15,0(%r15)
+ oi 6(%r15),0x60 # enable sigp emergency & external call
+ oi 4(%r15),0x10 # switch on low address proctection
+ lctlg %c0,%c15,0(%r15)
+
+ lam 0,15,.Laregs-.LPG1(%r13) # load acrs needed by uaccess
+ brasl %r14,start_kernel # go to C code
+#
+# We returned from start_kernel ?!? PANIK
+#
+ basr %r13,0
+ lpswe .Ldw-.(%r13) # load disabled wait psw
+
.align 16
.LPG1:
-.Lentry:.quad 0x0000000180000000,_stext
.Lctl: .quad 0x04040000 # cr0: AFP registers & secondary space
.quad 0 # cr1: primary space segment table
.quad .Lduct # cr2: dispatchable unit control table
@@ -85,30 +97,5 @@
.endr
.Llinkage_stack:
.long 0,0,0x89000000,0,0,0,0x8a000000,0
-
-ENTRY(_ehead)
-
- .org 0x100000 - 0x11000 # head.o ends at 0x11000
-#
-# startup-code, running in absolute addressing mode
-#
-ENTRY(_stext)
- basr %r13,0 # get base
-.LPG3:
-# check control registers
- stctg %c0,%c15,0(%r15)
- oi 6(%r15),0x60 # enable sigp emergency & external call
- oi 4(%r15),0x10 # switch on low address proctection
- lctlg %c0,%c15,0(%r15)
-
- lam 0,15,.Laregs-.LPG3(%r13) # load acrs needed by uaccess
- brasl %r14,start_kernel # go to C code
-#
-# We returned from start_kernel ?!? PANIK
-#
- basr %r13,0
- lpswe .Ldw-.(%r13) # load disabled wait psw
-
- .align 8
.Ldw: .quad 0x0002000180000000,0x0000000000000000
.Laregs:.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index 60f60af..7c0a095 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -321,38 +321,20 @@
* If we have no pre-handler or it returned 0, we
* continue with single stepping. If we have a
* pre-handler and it returned non-zero, it prepped
- * for calling the break_handler below on re-entry
- * for jprobe processing, so get out doing nothing
- * more here.
+ * for changing execution path, so get out doing
+ * nothing more here.
*/
push_kprobe(kcb, p);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- if (p->pre_handler && p->pre_handler(p, regs))
+ if (p->pre_handler && p->pre_handler(p, regs)) {
+ pop_kprobe(kcb);
+ preempt_enable_no_resched();
return 1;
+ }
kcb->kprobe_status = KPROBE_HIT_SS;
}
enable_singlestep(kcb, regs, (unsigned long) p->ainsn.insn);
return 1;
- } else if (kprobe_running()) {
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- /*
- * Continuation after the jprobe completed and
- * caused the jprobe_return trap. The jprobe
- * break_handler "returns" to the original
- * function that still has the kprobe breakpoint
- * installed. We continue with single stepping.
- */
- kcb->kprobe_status = KPROBE_HIT_SS;
- enable_singlestep(kcb, regs,
- (unsigned long) p->ainsn.insn);
- return 1;
- } /* else:
- * No kprobe at this address and the current kprobe
- * has no break handler (no jprobe!). The kernel just
- * exploded, let the standard trap handler pick up the
- * pieces.
- */
} /* else:
* No kprobe at this address and no active kprobe. The trap has
* not been caused by a kprobe breakpoint. The race of breakpoint
@@ -452,9 +434,7 @@
regs->psw.addr = orig_ret_address;
- pop_kprobe(get_kprobe_ctlblk());
kretprobe_hash_unlock(current, &flags);
- preempt_enable_no_resched();
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
@@ -661,60 +641,6 @@
}
NOKPROBE_SYMBOL(kprobe_exceptions_notify);
-int setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long stack;
-
- memcpy(&kcb->jprobe_saved_regs, regs, sizeof(struct pt_regs));
-
- /* setup return addr to the jprobe handler routine */
- regs->psw.addr = (unsigned long) jp->entry;
- regs->psw.mask &= ~(PSW_MASK_IO | PSW_MASK_EXT);
-
- /* r15 is the stack pointer */
- stack = (unsigned long) regs->gprs[15];
-
- memcpy(kcb->jprobes_stack, (void *) stack, MIN_STACK_SIZE(stack));
-
- /*
- * jprobes use jprobe_return() which skips the normal return
- * path of the function, and this messes up the accounting of the
- * function graph tracer to get messed up.
- *
- * Pause function graph tracing while performing the jprobe function.
- */
- pause_graph_tracing();
- return 1;
-}
-NOKPROBE_SYMBOL(setjmp_pre_handler);
-
-void jprobe_return(void)
-{
- asm volatile(".word 0x0002");
-}
-NOKPROBE_SYMBOL(jprobe_return);
-
-int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long stack;
-
- /* It's OK to start function graph tracing again */
- unpause_graph_tracing();
-
- stack = (unsigned long) kcb->jprobe_saved_regs.gprs[15];
-
- /* Put the regs back */
- memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
- /* put the stack back */
- memcpy((void *) stack, kcb->jprobes_stack, MIN_STACK_SIZE(stack));
- preempt_enable_no_resched();
- return 1;
-}
-NOKPROBE_SYMBOL(longjmp_break_handler);
-
static struct kprobe trampoline = {
.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
.pre_handler = trampoline_probe_handler
diff --git a/arch/s390/kernel/nospec-branch.c b/arch/s390/kernel/nospec-branch.c
index 18ae7b9..bdddaae 100644
--- a/arch/s390/kernel/nospec-branch.c
+++ b/arch/s390/kernel/nospec-branch.c
@@ -35,6 +35,8 @@
static int __init nospec_report(void)
{
+ if (test_facility(156))
+ pr_info("Spectre V2 mitigation: etokens\n");
if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable)
pr_info("Spectre V2 mitigation: execute trampolines\n");
if (__test_facility(82, S390_lowcore.alt_stfle_fac_list))
@@ -56,7 +58,15 @@
void __init nospec_auto_detect(void)
{
- if (IS_ENABLED(CC_USING_EXPOLINE)) {
+ if (test_facility(156)) {
+ /*
+ * The machine supports etokens.
+ * Disable expolines and disable nobp.
+ */
+ if (IS_ENABLED(CC_USING_EXPOLINE))
+ nospec_disable = 1;
+ __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+ } else if (IS_ENABLED(CC_USING_EXPOLINE)) {
/*
* The kernel has been compiled with expolines.
* Keep expolines enabled and disable nobp.
diff --git a/arch/s390/kernel/nospec-sysfs.c b/arch/s390/kernel/nospec-sysfs.c
index 8affad5..e30e580 100644
--- a/arch/s390/kernel/nospec-sysfs.c
+++ b/arch/s390/kernel/nospec-sysfs.c
@@ -13,6 +13,8 @@
ssize_t cpu_show_spectre_v2(struct device *dev,
struct device_attribute *attr, char *buf)
{
+ if (test_facility(156))
+ return sprintf(buf, "Mitigation: etokens\n");
if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable)
return sprintf(buf, "Mitigation: execute trampolines\n");
if (__test_facility(82, S390_lowcore.alt_stfle_fac_list))
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 0292d68..cb198d4 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -2,7 +2,7 @@
/*
* Performance event support for the System z CPU-measurement Sampling Facility
*
- * Copyright IBM Corp. 2013
+ * Copyright IBM Corp. 2013, 2018
* Author(s): Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
*/
#define KMSG_COMPONENT "cpum_sf"
@@ -1587,6 +1587,17 @@
"%lu SDBTs\n", num_sdbt);
}
+static void aux_sdb_init(unsigned long sdb)
+{
+ struct hws_trailer_entry *te;
+
+ te = (struct hws_trailer_entry *)trailer_entry_ptr(sdb);
+
+ /* Save clock base */
+ te->clock_base = 1;
+ memcpy(&te->progusage2, &tod_clock_base[1], 8);
+}
+
/*
* aux_buffer_setup() - Setup AUX buffer for diagnostic mode sampling
* @cpu: On which to allocate, -1 means current
@@ -1666,6 +1677,7 @@
/* Tail is the entry in a SDBT */
*tail = (unsigned long)pages[i];
aux->sdb_index[i] = (unsigned long)pages[i];
+ aux_sdb_init((unsigned long)pages[i]);
}
sfb->num_sdb = nr_pages;
diff --git a/arch/s390/kernel/perf_regs.c b/arch/s390/kernel/perf_regs.c
index 54e2d63..4352a50 100644
--- a/arch/s390/kernel/perf_regs.c
+++ b/arch/s390/kernel/perf_regs.c
@@ -12,9 +12,6 @@
{
freg_t fp;
- if (WARN_ON_ONCE((u32)idx >= PERF_REG_S390_MAX))
- return 0;
-
if (idx >= PERF_REG_S390_R0 && idx <= PERF_REG_S390_R15)
return regs->gprs[idx];
@@ -33,7 +30,8 @@
if (idx == PERF_REG_S390_PC)
return regs->psw.addr;
- return regs->gprs[idx];
+ WARN_ON_ONCE((u32)idx >= PERF_REG_S390_MAX);
+ return 0;
}
#define REG_RESERVED (~((1UL << PERF_REG_S390_MAX) - 1))
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index d82a9ec..c637c12 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -674,12 +674,12 @@
#ifdef CONFIG_DMA_API_DEBUG
/*
* DMA_API_DEBUG code stumbles over addresses from the
- * range [_ehead, _stext]. Mark the memory as reserved
+ * range [PARMAREA_END, _stext]. Mark the memory as reserved
* so it is not used for CONFIG_DMA_API_DEBUG=y.
*/
memblock_reserve(0, PFN_PHYS(start_pfn));
#else
- memblock_reserve(0, (unsigned long)_ehead);
+ memblock_reserve(0, PARMAREA_END);
memblock_reserve((unsigned long)_stext, PFN_PHYS(start_pfn)
- (unsigned long)_stext);
#endif
diff --git a/arch/s390/kernel/sysinfo.c b/arch/s390/kernel/sysinfo.c
index 54f5496..12f80d1 100644
--- a/arch/s390/kernel/sysinfo.c
+++ b/arch/s390/kernel/sysinfo.c
@@ -59,6 +59,8 @@
}
EXPORT_SYMBOL(stsi);
+#ifdef CONFIG_PROC_FS
+
static bool convert_ext_name(unsigned char encoding, char *name, size_t len)
{
switch (encoding) {
@@ -301,6 +303,8 @@
}
device_initcall(sysinfo_create_proc);
+#endif /* CONFIG_PROC_FS */
+
/*
* Service levels interface.
*/
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index cf56116..e8766be 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -221,17 +221,22 @@
ext_to_timespec64(clk, ts);
}
-void read_boot_clock64(struct timespec64 *ts)
+void __init read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
+ struct timespec64 *boot_offset)
{
unsigned char clk[STORE_CLOCK_EXT_SIZE];
+ struct timespec64 boot_time;
__u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
- memcpy(clk, tod_clock_base, 16);
- *(__u64 *) &clk[1] -= delta;
- if (*(__u64 *) &clk[1] > delta)
+ memcpy(clk, tod_clock_base, STORE_CLOCK_EXT_SIZE);
+ *(__u64 *)&clk[1] -= delta;
+ if (*(__u64 *)&clk[1] > delta)
clk[0]--;
- ext_to_timespec64(clk, ts);
+ ext_to_timespec64(clk, &boot_time);
+
+ read_persistent_clock64(wall_time);
+ *boot_offset = timespec64_sub(*wall_time, boot_time);
}
static u64 read_tod_clock(struct clocksource *cs)
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 4b6e039..e8184a1 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -579,41 +579,33 @@
static int topology_ctl_handler(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
- unsigned int len;
+ int enabled = topology_is_enabled();
int new_mode;
- char buf[2];
+ int zero = 0;
+ int one = 1;
+ int rc;
+ struct ctl_table ctl_entry = {
+ .procname = ctl->procname,
+ .data = &enabled,
+ .maxlen = sizeof(int),
+ .extra1 = &zero,
+ .extra2 = &one,
+ };
- if (!*lenp || *ppos) {
- *lenp = 0;
- return 0;
- }
- if (!write) {
- strncpy(buf, topology_is_enabled() ? "1\n" : "0\n",
- ARRAY_SIZE(buf));
- len = strnlen(buf, ARRAY_SIZE(buf));
- if (len > *lenp)
- len = *lenp;
- if (copy_to_user(buffer, buf, len))
- return -EFAULT;
- goto out;
- }
- len = *lenp;
- if (copy_from_user(buf, buffer, len > sizeof(buf) ? sizeof(buf) : len))
- return -EFAULT;
- if (buf[0] != '0' && buf[0] != '1')
- return -EINVAL;
+ rc = proc_douintvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
+ if (rc < 0 || !write)
+ return rc;
+
mutex_lock(&smp_cpu_state_mutex);
- new_mode = topology_get_mode(buf[0] == '1');
+ new_mode = topology_get_mode(enabled);
if (topology_mode != new_mode) {
topology_mode = new_mode;
topology_schedule_update();
}
mutex_unlock(&smp_cpu_state_mutex);
topology_flush_work();
-out:
- *lenp = len;
- *ppos += len;
- return 0;
+
+ return rc;
}
static struct ctl_table topology_ctl_table[] = {
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index 09abae4..3031cc6 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -47,7 +47,7 @@
*/
unsigned int __read_mostly vdso_enabled = 1;
-static int vdso_fault(const struct vm_special_mapping *sm,
+static vm_fault_t vdso_fault(const struct vm_special_mapping *sm,
struct vm_area_struct *vma, struct vm_fault *vmf)
{
struct page **vdso_pagelist;
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index f0414f5..b43f8d3 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -19,7 +19,7 @@
OUTPUT_FORMAT("elf64-s390", "elf64-s390", "elf64-s390")
OUTPUT_ARCH(s390:64-bit)
-ENTRY(startup)
+ENTRY(startup_continue)
jiffies = jiffies_64;
PHDRS {
@@ -30,16 +30,12 @@
SECTIONS
{
- . = 0x00000000;
+ . = 0x100000;
+ _stext = .; /* Start of text section */
.text : {
/* Text and read-only data */
+ _text = .;
HEAD_TEXT
- /*
- * E.g. perf doesn't like symbols starting at address zero,
- * therefore skip the initial PSW and channel program located
- * at address zero and let _text start at 0x200.
- */
- _text = 0x200;
TEXT_TEXT
SCHED_TEXT
CPUIDLE_TEXT
@@ -47,6 +43,7 @@
KPROBES_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
+ *(.text.*_indirect_*)
*(.fixup)
*(.gnu.warning)
} :text = 0x0700
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index daa09f8..fcb55b0 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -1145,7 +1145,7 @@
* yield-candidate.
*/
vcpu->preempted = true;
- swake_up(&vcpu->wq);
+ swake_up_one(&vcpu->wq);
vcpu->stat.halt_wakeup++;
}
/*
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 3b7a515..f9d9033 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -172,6 +172,10 @@
module_param(nested, int, S_IRUGO);
MODULE_PARM_DESC(nested, "Nested virtualization support");
+/* allow 1m huge page guest backing, if !nested */
+static int hpage;
+module_param(hpage, int, 0444);
+MODULE_PARM_DESC(hpage, "1m huge page backing support");
/*
* For now we handle at most 16 double words as this is what the s390 base
@@ -475,6 +479,11 @@
case KVM_CAP_S390_AIS_MIGRATION:
r = 1;
break;
+ case KVM_CAP_S390_HPAGE_1M:
+ r = 0;
+ if (hpage)
+ r = 1;
+ break;
case KVM_CAP_S390_MEM_OP:
r = MEM_OP_MAX_SIZE;
break;
@@ -511,19 +520,30 @@
}
static void kvm_s390_sync_dirty_log(struct kvm *kvm,
- struct kvm_memory_slot *memslot)
+ struct kvm_memory_slot *memslot)
{
+ int i;
gfn_t cur_gfn, last_gfn;
- unsigned long address;
+ unsigned long gaddr, vmaddr;
struct gmap *gmap = kvm->arch.gmap;
+ DECLARE_BITMAP(bitmap, _PAGE_ENTRIES);
- /* Loop over all guest pages */
+ /* Loop over all guest segments */
+ cur_gfn = memslot->base_gfn;
last_gfn = memslot->base_gfn + memslot->npages;
- for (cur_gfn = memslot->base_gfn; cur_gfn <= last_gfn; cur_gfn++) {
- address = gfn_to_hva_memslot(memslot, cur_gfn);
+ for (; cur_gfn <= last_gfn; cur_gfn += _PAGE_ENTRIES) {
+ gaddr = gfn_to_gpa(cur_gfn);
+ vmaddr = gfn_to_hva_memslot(memslot, cur_gfn);
+ if (kvm_is_error_hva(vmaddr))
+ continue;
- if (test_and_clear_guest_dirty(gmap->mm, address))
- mark_page_dirty(kvm, cur_gfn);
+ bitmap_zero(bitmap, _PAGE_ENTRIES);
+ gmap_sync_dirty_log_pmd(gmap, bitmap, gaddr, vmaddr);
+ for (i = 0; i < _PAGE_ENTRIES; i++) {
+ if (test_bit(i, bitmap))
+ mark_page_dirty(kvm, cur_gfn + i);
+ }
+
if (fatal_signal_pending(current))
return;
cond_resched();
@@ -667,6 +687,27 @@
VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s",
r ? "(not available)" : "(success)");
break;
+ case KVM_CAP_S390_HPAGE_1M:
+ mutex_lock(&kvm->lock);
+ if (kvm->created_vcpus)
+ r = -EBUSY;
+ else if (!hpage || kvm->arch.use_cmma)
+ r = -EINVAL;
+ else {
+ r = 0;
+ kvm->mm->context.allow_gmap_hpage_1m = 1;
+ /*
+ * We might have to create fake 4k page
+ * tables. To avoid that the hardware works on
+ * stale PGSTEs, we emulate these instructions.
+ */
+ kvm->arch.use_skf = 0;
+ kvm->arch.use_pfmfi = 0;
+ }
+ mutex_unlock(&kvm->lock);
+ VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s",
+ r ? "(not available)" : "(success)");
+ break;
case KVM_CAP_S390_USER_STSI:
VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI");
kvm->arch.user_stsi = 1;
@@ -714,10 +755,13 @@
if (!sclp.has_cmma)
break;
- ret = -EBUSY;
VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support");
mutex_lock(&kvm->lock);
- if (!kvm->created_vcpus) {
+ if (kvm->created_vcpus)
+ ret = -EBUSY;
+ else if (kvm->mm->context.allow_gmap_hpage_1m)
+ ret = -EINVAL;
+ else {
kvm->arch.use_cmma = 1;
/* Not compatible with cmma. */
kvm->arch.use_pfmfi = 0;
@@ -1540,6 +1584,7 @@
uint8_t *keys;
uint64_t hva;
int srcu_idx, i, r = 0;
+ bool unlocked;
if (args->flags != 0)
return -EINVAL;
@@ -1564,9 +1609,11 @@
if (r)
goto out;
+ i = 0;
down_read(¤t->mm->mmap_sem);
srcu_idx = srcu_read_lock(&kvm->srcu);
- for (i = 0; i < args->count; i++) {
+ while (i < args->count) {
+ unlocked = false;
hva = gfn_to_hva(kvm, args->start_gfn + i);
if (kvm_is_error_hva(hva)) {
r = -EFAULT;
@@ -1580,8 +1627,14 @@
}
r = set_guest_storage_key(current->mm, hva, keys[i], 0);
- if (r)
- break;
+ if (r) {
+ r = fixup_user_fault(current, current->mm, hva,
+ FAULT_FLAG_WRITE, &unlocked);
+ if (r)
+ break;
+ }
+ if (!r)
+ i++;
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
up_read(¤t->mm->mmap_sem);
@@ -4082,6 +4135,11 @@
return -ENODEV;
}
+ if (nested && hpage) {
+ pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently");
+ return -EINVAL;
+ }
+
for (i = 0; i < 16; i++)
kvm_s390_fac_base[i] |=
S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i);
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index eb0eb60..cfc5a62 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -246,9 +246,10 @@
static int handle_iske(struct kvm_vcpu *vcpu)
{
- unsigned long addr;
+ unsigned long gaddr, vmaddr;
unsigned char key;
int reg1, reg2;
+ bool unlocked;
int rc;
vcpu->stat.instruction_iske++;
@@ -262,18 +263,28 @@
kvm_s390_get_regs_rre(vcpu, ®1, ®2);
- addr = vcpu->run->s.regs.gprs[reg2] & PAGE_MASK;
- addr = kvm_s390_logical_to_effective(vcpu, addr);
- addr = kvm_s390_real_to_abs(vcpu, addr);
- addr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(addr));
- if (kvm_is_error_hva(addr))
+ gaddr = vcpu->run->s.regs.gprs[reg2] & PAGE_MASK;
+ gaddr = kvm_s390_logical_to_effective(vcpu, gaddr);
+ gaddr = kvm_s390_real_to_abs(vcpu, gaddr);
+ vmaddr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr));
+ if (kvm_is_error_hva(vmaddr))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
-
+retry:
+ unlocked = false;
down_read(¤t->mm->mmap_sem);
- rc = get_guest_storage_key(current->mm, addr, &key);
- up_read(¤t->mm->mmap_sem);
+ rc = get_guest_storage_key(current->mm, vmaddr, &key);
+
+ if (rc) {
+ rc = fixup_user_fault(current, current->mm, vmaddr,
+ FAULT_FLAG_WRITE, &unlocked);
+ if (!rc) {
+ up_read(¤t->mm->mmap_sem);
+ goto retry;
+ }
+ }
if (rc)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+ up_read(¤t->mm->mmap_sem);
vcpu->run->s.regs.gprs[reg1] &= ~0xff;
vcpu->run->s.regs.gprs[reg1] |= key;
return 0;
@@ -281,8 +292,9 @@
static int handle_rrbe(struct kvm_vcpu *vcpu)
{
- unsigned long addr;
+ unsigned long vmaddr, gaddr;
int reg1, reg2;
+ bool unlocked;
int rc;
vcpu->stat.instruction_rrbe++;
@@ -296,19 +308,27 @@
kvm_s390_get_regs_rre(vcpu, ®1, ®2);
- addr = vcpu->run->s.regs.gprs[reg2] & PAGE_MASK;
- addr = kvm_s390_logical_to_effective(vcpu, addr);
- addr = kvm_s390_real_to_abs(vcpu, addr);
- addr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(addr));
- if (kvm_is_error_hva(addr))
+ gaddr = vcpu->run->s.regs.gprs[reg2] & PAGE_MASK;
+ gaddr = kvm_s390_logical_to_effective(vcpu, gaddr);
+ gaddr = kvm_s390_real_to_abs(vcpu, gaddr);
+ vmaddr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(gaddr));
+ if (kvm_is_error_hva(vmaddr))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
-
+retry:
+ unlocked = false;
down_read(¤t->mm->mmap_sem);
- rc = reset_guest_reference_bit(current->mm, addr);
- up_read(¤t->mm->mmap_sem);
+ rc = reset_guest_reference_bit(current->mm, vmaddr);
+ if (rc < 0) {
+ rc = fixup_user_fault(current, current->mm, vmaddr,
+ FAULT_FLAG_WRITE, &unlocked);
+ if (!rc) {
+ up_read(¤t->mm->mmap_sem);
+ goto retry;
+ }
+ }
if (rc < 0)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
-
+ up_read(¤t->mm->mmap_sem);
kvm_s390_set_psw_cc(vcpu, rc);
return 0;
}
@@ -323,6 +343,7 @@
unsigned long start, end;
unsigned char key, oldkey;
int reg1, reg2;
+ bool unlocked;
int rc;
vcpu->stat.instruction_sske++;
@@ -355,19 +376,28 @@
}
while (start != end) {
- unsigned long addr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(start));
+ unsigned long vmaddr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(start));
+ unlocked = false;
- if (kvm_is_error_hva(addr))
+ if (kvm_is_error_hva(vmaddr))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
down_read(¤t->mm->mmap_sem);
- rc = cond_set_guest_storage_key(current->mm, addr, key, &oldkey,
+ rc = cond_set_guest_storage_key(current->mm, vmaddr, key, &oldkey,
m3 & SSKE_NQ, m3 & SSKE_MR,
m3 & SSKE_MC);
- up_read(¤t->mm->mmap_sem);
- if (rc < 0)
+
+ if (rc < 0) {
+ rc = fixup_user_fault(current, current->mm, vmaddr,
+ FAULT_FLAG_WRITE, &unlocked);
+ rc = !rc ? -EAGAIN : rc;
+ }
+ if (rc == -EFAULT)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
- start += PAGE_SIZE;
+
+ up_read(¤t->mm->mmap_sem);
+ if (rc >= 0)
+ start += PAGE_SIZE;
}
if (m3 & (SSKE_MC | SSKE_MR)) {
@@ -948,15 +978,16 @@
}
while (start != end) {
- unsigned long useraddr;
+ unsigned long vmaddr;
+ bool unlocked = false;
/* Translate guest address to host address */
- useraddr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(start));
- if (kvm_is_error_hva(useraddr))
+ vmaddr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(start));
+ if (kvm_is_error_hva(vmaddr))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
if (vcpu->run->s.regs.gprs[reg1] & PFMF_CF) {
- if (clear_user((void __user *)useraddr, PAGE_SIZE))
+ if (clear_user((void __user *)vmaddr, PAGE_SIZE))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
}
@@ -966,14 +997,20 @@
if (rc)
return rc;
down_read(¤t->mm->mmap_sem);
- rc = cond_set_guest_storage_key(current->mm, useraddr,
+ rc = cond_set_guest_storage_key(current->mm, vmaddr,
key, NULL, nq, mr, mc);
- up_read(¤t->mm->mmap_sem);
- if (rc < 0)
+ if (rc < 0) {
+ rc = fixup_user_fault(current, current->mm, vmaddr,
+ FAULT_FLAG_WRITE, &unlocked);
+ rc = !rc ? -EAGAIN : rc;
+ }
+ if (rc == -EFAULT)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
- }
- start += PAGE_SIZE;
+ up_read(¤t->mm->mmap_sem);
+ if (rc >= 0)
+ start += PAGE_SIZE;
+ }
}
if (vcpu->run->s.regs.gprs[reg1] & PFMF_FSC) {
if (psw_bits(vcpu->arch.sie_block->gpsw).eaba == PSW_BITS_AMODE_64BIT) {
diff --git a/arch/s390/lib/mem.S b/arch/s390/lib/mem.S
index 2311f15..40c4d59 100644
--- a/arch/s390/lib/mem.S
+++ b/arch/s390/lib/mem.S
@@ -17,7 +17,7 @@
ENTRY(memmove)
ltgr %r4,%r4
lgr %r1,%r2
- bzr %r14
+ jz .Lmemmove_exit
aghi %r4,-1
clgr %r2,%r3
jnh .Lmemmove_forward
@@ -36,6 +36,7 @@
.Lmemmove_forward_remainder:
larl %r5,.Lmemmove_mvc
ex %r4,0(%r5)
+.Lmemmove_exit:
BR_EX %r14
.Lmemmove_reverse:
ic %r0,0(%r4,%r3)
@@ -65,7 +66,7 @@
*/
ENTRY(memset)
ltgr %r4,%r4
- bzr %r14
+ jz .Lmemset_exit
ltgr %r3,%r3
jnz .Lmemset_fill
aghi %r4,-1
@@ -80,6 +81,7 @@
.Lmemset_clear_remainder:
larl %r3,.Lmemset_xc
ex %r4,0(%r3)
+.Lmemset_exit:
BR_EX %r14
.Lmemset_fill:
cghi %r4,1
@@ -115,7 +117,7 @@
*/
ENTRY(memcpy)
ltgr %r4,%r4
- bzr %r14
+ jz .Lmemcpy_exit
aghi %r4,-1
srlg %r5,%r4,8
ltgr %r5,%r5
@@ -124,6 +126,7 @@
.Lmemcpy_remainder:
larl %r5,.Lmemcpy_mvc
ex %r4,0(%r5)
+.Lmemcpy_exit:
BR_EX %r14
.Lmemcpy_loop:
mvc 0(256,%r1),0(%r3)
@@ -145,9 +148,9 @@
.macro __MEMSET bits,bytes,insn
ENTRY(__memset\bits)
ltgr %r4,%r4
- bzr %r14
+ jz .L__memset_exit\bits
cghi %r4,\bytes
- je .L__memset_exit\bits
+ je .L__memset_store\bits
aghi %r4,-(\bytes+1)
srlg %r5,%r4,8
ltgr %r5,%r5
@@ -163,8 +166,9 @@
larl %r5,.L__memset_mvc\bits
ex %r4,0(%r5)
BR_EX %r14
-.L__memset_exit\bits:
+.L__memset_store\bits:
\insn %r3,0(%r2)
+.L__memset_exit\bits:
BR_EX %r14
.L__memset_mvc\bits:
mvc \bytes(1,%r1),0(%r1)
diff --git a/arch/s390/mm/cmm.c b/arch/s390/mm/cmm.c
index 6cf024e..510a182 100644
--- a/arch/s390/mm/cmm.c
+++ b/arch/s390/mm/cmm.c
@@ -191,12 +191,7 @@
del_timer(&cmm_timer);
return;
}
- if (timer_pending(&cmm_timer)) {
- if (mod_timer(&cmm_timer, jiffies + cmm_timeout_seconds*HZ))
- return;
- }
- cmm_timer.expires = jiffies + cmm_timeout_seconds*HZ;
- add_timer(&cmm_timer);
+ mod_timer(&cmm_timer, jiffies + cmm_timeout_seconds * HZ);
}
static void cmm_timer_fn(struct timer_list *unused)
@@ -251,45 +246,42 @@
return str != cp;
}
-static struct ctl_table cmm_table[];
-
static int cmm_pages_handler(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
- char buf[16], *p;
- unsigned int len;
- long nr;
+ long nr = cmm_get_pages();
+ struct ctl_table ctl_entry = {
+ .procname = ctl->procname,
+ .data = &nr,
+ .maxlen = sizeof(long),
+ };
+ int rc;
- if (!*lenp || (*ppos && !write)) {
- *lenp = 0;
- return 0;
- }
+ rc = proc_doulongvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
+ if (rc < 0 || !write)
+ return rc;
- if (write) {
- len = *lenp;
- if (copy_from_user(buf, buffer,
- len > sizeof(buf) ? sizeof(buf) : len))
- return -EFAULT;
- buf[sizeof(buf) - 1] = '\0';
- cmm_skip_blanks(buf, &p);
- nr = simple_strtoul(p, &p, 0);
- if (ctl == &cmm_table[0])
- cmm_set_pages(nr);
- else
- cmm_add_timed_pages(nr);
- } else {
- if (ctl == &cmm_table[0])
- nr = cmm_get_pages();
- else
- nr = cmm_get_timed_pages();
- len = sprintf(buf, "%ld\n", nr);
- if (len > *lenp)
- len = *lenp;
- if (copy_to_user(buffer, buf, len))
- return -EFAULT;
- }
- *lenp = len;
- *ppos += len;
+ cmm_set_pages(nr);
+ return 0;
+}
+
+static int cmm_timed_pages_handler(struct ctl_table *ctl, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ long nr = cmm_get_timed_pages();
+ struct ctl_table ctl_entry = {
+ .procname = ctl->procname,
+ .data = &nr,
+ .maxlen = sizeof(long),
+ };
+ int rc;
+
+ rc = proc_doulongvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
+ if (rc < 0 || !write)
+ return rc;
+
+ cmm_add_timed_pages(nr);
return 0;
}
@@ -338,7 +330,7 @@
{
.procname = "cmm_timed_pages",
.mode = 0644,
- .proc_handler = cmm_pages_handler,
+ .proc_handler = cmm_timed_pages_handler,
},
{
.procname = "cmm_timeout",
diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
index 6ad15d3..84111a4 100644
--- a/arch/s390/mm/extmem.c
+++ b/arch/s390/mm/extmem.c
@@ -80,7 +80,7 @@
struct dcss_segment {
struct list_head list;
char dcss_name[8];
- char res_name[15];
+ char res_name[16];
unsigned long start_addr;
unsigned long end;
atomic_t ref_count;
@@ -433,7 +433,7 @@
memcpy(&seg->res_name, seg->dcss_name, 8);
EBCASC(seg->res_name, 8);
seg->res_name[8] = '\0';
- strncat(seg->res_name, " (DCSS)", 7);
+ strlcat(seg->res_name, " (DCSS)", sizeof(seg->res_name));
seg->res->name = seg->res_name;
rc = seg->vm_segtype;
if (rc == SEG_TYPE_SC ||
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index e074480..4cc3f06 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -502,6 +502,8 @@
/* No reason to continue if interrupted by SIGKILL. */
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
fault = VM_FAULT_SIGNAL;
+ if (flags & FAULT_FLAG_RETRY_NOWAIT)
+ goto out_up;
goto out;
}
if (unlikely(fault & VM_FAULT_ERROR))
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index bc56ec8..bb44990 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2,8 +2,10 @@
/*
* KVM guest address space mapping code
*
- * Copyright IBM Corp. 2007, 2016
+ * Copyright IBM Corp. 2007, 2016, 2018
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
+ * David Hildenbrand <david@redhat.com>
+ * Janosch Frank <frankja@linux.vnet.ibm.com>
*/
#include <linux/kernel.h>
@@ -521,6 +523,9 @@
rcu_read_unlock();
}
+static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *old, pmd_t new,
+ unsigned long gaddr);
+
/**
* gmap_link - set up shadow page tables to connect a host to a guest address
* @gmap: pointer to guest mapping meta data structure
@@ -541,6 +546,7 @@
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
+ u64 unprot;
int rc;
BUG_ON(gmap_is_shadow(gmap));
@@ -584,8 +590,8 @@
return -EFAULT;
pmd = pmd_offset(pud, vmaddr);
VM_BUG_ON(pmd_none(*pmd));
- /* large pmds cannot yet be handled */
- if (pmd_large(*pmd))
+ /* Are we allowed to use huge pages? */
+ if (pmd_large(*pmd) && !gmap->mm->context.allow_gmap_hpage_1m)
return -EFAULT;
/* Link gmap segment table entry location to page table. */
rc = radix_tree_preload(GFP_KERNEL);
@@ -596,10 +602,22 @@
if (*table == _SEGMENT_ENTRY_EMPTY) {
rc = radix_tree_insert(&gmap->host_to_guest,
vmaddr >> PMD_SHIFT, table);
- if (!rc)
- *table = pmd_val(*pmd);
- } else
- rc = 0;
+ if (!rc) {
+ if (pmd_large(*pmd)) {
+ *table = (pmd_val(*pmd) &
+ _SEGMENT_ENTRY_HARDWARE_BITS_LARGE)
+ | _SEGMENT_ENTRY_GMAP_UC;
+ } else
+ *table = pmd_val(*pmd) &
+ _SEGMENT_ENTRY_HARDWARE_BITS;
+ }
+ } else if (*table & _SEGMENT_ENTRY_PROTECT &&
+ !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
+ unprot = (u64)*table;
+ unprot &= ~_SEGMENT_ENTRY_PROTECT;
+ unprot |= _SEGMENT_ENTRY_GMAP_UC;
+ gmap_pmdp_xchg(gmap, (pmd_t *)table, __pmd(unprot), gaddr);
+ }
spin_unlock(&gmap->guest_table_lock);
spin_unlock(ptl);
radix_tree_preload_end();
@@ -690,6 +708,12 @@
vmaddr |= gaddr & ~PMD_MASK;
/* Find vma in the parent mm */
vma = find_vma(gmap->mm, vmaddr);
+ /*
+ * We do not discard pages that are backed by
+ * hugetlbfs, so we don't have to refault them.
+ */
+ if (vma && is_vm_hugetlb_page(vma))
+ continue;
size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
zap_page_range(vma, vmaddr, size);
}
@@ -864,7 +888,128 @@
*/
static void gmap_pte_op_end(spinlock_t *ptl)
{
- spin_unlock(ptl);
+ if (ptl)
+ spin_unlock(ptl);
+}
+
+/**
+ * gmap_pmd_op_walk - walk the gmap tables, get the guest table lock
+ * and return the pmd pointer
+ * @gmap: pointer to guest mapping meta data structure
+ * @gaddr: virtual address in the guest address space
+ *
+ * Returns a pointer to the pmd for a guest address, or NULL
+ */
+static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gaddr)
+{
+ pmd_t *pmdp;
+
+ BUG_ON(gmap_is_shadow(gmap));
+ spin_lock(&gmap->guest_table_lock);
+ pmdp = (pmd_t *) gmap_table_walk(gmap, gaddr, 1);
+
+ if (!pmdp || pmd_none(*pmdp)) {
+ spin_unlock(&gmap->guest_table_lock);
+ return NULL;
+ }
+
+ /* 4k page table entries are locked via the pte (pte_alloc_map_lock). */
+ if (!pmd_large(*pmdp))
+ spin_unlock(&gmap->guest_table_lock);
+ return pmdp;
+}
+
+/**
+ * gmap_pmd_op_end - release the guest_table_lock if needed
+ * @gmap: pointer to the guest mapping meta data structure
+ * @pmdp: pointer to the pmd
+ */
+static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp)
+{
+ if (pmd_large(*pmdp))
+ spin_unlock(&gmap->guest_table_lock);
+}
+
+/*
+ * gmap_protect_pmd - remove access rights to memory and set pmd notification bits
+ * @pmdp: pointer to the pmd to be protected
+ * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
+ * @bits: notification bits to set
+ *
+ * Returns:
+ * 0 if successfully protected
+ * -EAGAIN if a fixup is needed
+ * -EINVAL if unsupported notifier bits have been specified
+ *
+ * Expected to be called with sg->mm->mmap_sem in read and
+ * guest_table_lock held.
+ */
+static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
+ pmd_t *pmdp, int prot, unsigned long bits)
+{
+ int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
+ int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
+ pmd_t new = *pmdp;
+
+ /* Fixup needed */
+ if ((pmd_i && (prot != PROT_NONE)) || (pmd_p && (prot == PROT_WRITE)))
+ return -EAGAIN;
+
+ if (prot == PROT_NONE && !pmd_i) {
+ pmd_val(new) |= _SEGMENT_ENTRY_INVALID;
+ gmap_pmdp_xchg(gmap, pmdp, new, gaddr);
+ }
+
+ if (prot == PROT_READ && !pmd_p) {
+ pmd_val(new) &= ~_SEGMENT_ENTRY_INVALID;
+ pmd_val(new) |= _SEGMENT_ENTRY_PROTECT;
+ gmap_pmdp_xchg(gmap, pmdp, new, gaddr);
+ }
+
+ if (bits & GMAP_NOTIFY_MPROT)
+ pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
+
+ /* Shadow GMAP protection needs split PMDs */
+ if (bits & GMAP_NOTIFY_SHADOW)
+ return -EINVAL;
+
+ return 0;
+}
+
+/*
+ * gmap_protect_pte - remove access rights to memory and set pgste bits
+ * @gmap: pointer to guest mapping meta data structure
+ * @gaddr: virtual address in the guest address space
+ * @pmdp: pointer to the pmd associated with the pte
+ * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
+ * @bits: notification bits to set
+ *
+ * Returns 0 if successfully protected, -ENOMEM if out of memory and
+ * -EAGAIN if a fixup is needed.
+ *
+ * Expected to be called with sg->mm->mmap_sem in read
+ */
+static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
+ pmd_t *pmdp, int prot, unsigned long bits)
+{
+ int rc;
+ pte_t *ptep;
+ spinlock_t *ptl = NULL;
+ unsigned long pbits = 0;
+
+ if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
+ return -EAGAIN;
+
+ ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
+ if (!ptep)
+ return -ENOMEM;
+
+ pbits |= (bits & GMAP_NOTIFY_MPROT) ? PGSTE_IN_BIT : 0;
+ pbits |= (bits & GMAP_NOTIFY_SHADOW) ? PGSTE_VSIE_BIT : 0;
+ /* Protect and unlock. */
+ rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits);
+ gmap_pte_op_end(ptl);
+ return rc;
}
/*
@@ -883,30 +1028,45 @@
static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
unsigned long len, int prot, unsigned long bits)
{
- unsigned long vmaddr;
- spinlock_t *ptl;
- pte_t *ptep;
+ unsigned long vmaddr, dist;
+ pmd_t *pmdp;
int rc;
BUG_ON(gmap_is_shadow(gmap));
while (len) {
rc = -EAGAIN;
- ptep = gmap_pte_op_walk(gmap, gaddr, &ptl);
- if (ptep) {
- rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
- gmap_pte_op_end(ptl);
+ pmdp = gmap_pmd_op_walk(gmap, gaddr);
+ if (pmdp) {
+ if (!pmd_large(*pmdp)) {
+ rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
+ bits);
+ if (!rc) {
+ len -= PAGE_SIZE;
+ gaddr += PAGE_SIZE;
+ }
+ } else {
+ rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot,
+ bits);
+ if (!rc) {
+ dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK);
+ len = len < dist ? 0 : len - dist;
+ gaddr = (gaddr & HPAGE_MASK) + HPAGE_SIZE;
+ }
+ }
+ gmap_pmd_op_end(gmap, pmdp);
}
if (rc) {
+ if (rc == -EINVAL)
+ return rc;
+
+ /* -EAGAIN, fixup of userspace mm and gmap */
vmaddr = __gmap_translate(gmap, gaddr);
if (IS_ERR_VALUE(vmaddr))
return vmaddr;
rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);
if (rc)
return rc;
- continue;
}
- gaddr += PAGE_SIZE;
- len -= PAGE_SIZE;
}
return 0;
}
@@ -935,7 +1095,7 @@
if (!MACHINE_HAS_ESOP && prot == PROT_READ)
return -EINVAL;
down_read(&gmap->mm->mmap_sem);
- rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT);
+ rc = gmap_protect_range(gmap, gaddr, len, prot, GMAP_NOTIFY_MPROT);
up_read(&gmap->mm->mmap_sem);
return rc;
}
@@ -1474,6 +1634,7 @@
unsigned long limit;
int rc;
+ BUG_ON(parent->mm->context.allow_gmap_hpage_1m);
BUG_ON(gmap_is_shadow(parent));
spin_lock(&parent->shadow_lock);
sg = gmap_find_shadow(parent, asce, edat_level);
@@ -1526,7 +1687,7 @@
down_read(&parent->mm->mmap_sem);
rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,
((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,
- PROT_READ, PGSTE_VSIE_BIT);
+ PROT_READ, GMAP_NOTIFY_SHADOW);
up_read(&parent->mm->mmap_sem);
spin_lock(&parent->shadow_lock);
new->initialized = true;
@@ -2092,6 +2253,225 @@
}
EXPORT_SYMBOL_GPL(ptep_notify);
+static void pmdp_notify_gmap(struct gmap *gmap, pmd_t *pmdp,
+ unsigned long gaddr)
+{
+ pmd_val(*pmdp) &= ~_SEGMENT_ENTRY_GMAP_IN;
+ gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
+}
+
+/**
+ * gmap_pmdp_xchg - exchange a gmap pmd with another
+ * @gmap: pointer to the guest address space structure
+ * @pmdp: pointer to the pmd entry
+ * @new: replacement entry
+ * @gaddr: the affected guest address
+ *
+ * This function is assumed to be called with the guest_table_lock
+ * held.
+ */
+static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *pmdp, pmd_t new,
+ unsigned long gaddr)
+{
+ gaddr &= HPAGE_MASK;
+ pmdp_notify_gmap(gmap, pmdp, gaddr);
+ pmd_val(new) &= ~_SEGMENT_ENTRY_GMAP_IN;
+ if (MACHINE_HAS_TLB_GUEST)
+ __pmdp_idte(gaddr, (pmd_t *)pmdp, IDTE_GUEST_ASCE, gmap->asce,
+ IDTE_GLOBAL);
+ else if (MACHINE_HAS_IDTE)
+ __pmdp_idte(gaddr, (pmd_t *)pmdp, 0, 0, IDTE_GLOBAL);
+ else
+ __pmdp_csp(pmdp);
+ *pmdp = new;
+}
+
+static void gmap_pmdp_clear(struct mm_struct *mm, unsigned long vmaddr,
+ int purge)
+{
+ pmd_t *pmdp;
+ struct gmap *gmap;
+ unsigned long gaddr;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+ spin_lock(&gmap->guest_table_lock);
+ pmdp = (pmd_t *)radix_tree_delete(&gmap->host_to_guest,
+ vmaddr >> PMD_SHIFT);
+ if (pmdp) {
+ gaddr = __gmap_segment_gaddr((unsigned long *)pmdp);
+ pmdp_notify_gmap(gmap, pmdp, gaddr);
+ WARN_ON(pmd_val(*pmdp) & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |
+ _SEGMENT_ENTRY_GMAP_UC));
+ if (purge)
+ __pmdp_csp(pmdp);
+ pmd_val(*pmdp) = _SEGMENT_ENTRY_EMPTY;
+ }
+ spin_unlock(&gmap->guest_table_lock);
+ }
+ rcu_read_unlock();
+}
+
+/**
+ * gmap_pmdp_invalidate - invalidate all affected guest pmd entries without
+ * flushing
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr)
+{
+ gmap_pmdp_clear(mm, vmaddr, 0);
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_invalidate);
+
+/**
+ * gmap_pmdp_csp - csp all affected guest pmd entries
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_csp(struct mm_struct *mm, unsigned long vmaddr)
+{
+ gmap_pmdp_clear(mm, vmaddr, 1);
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_csp);
+
+/**
+ * gmap_pmdp_idte_local - invalidate and clear a guest pmd entry
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr)
+{
+ unsigned long *entry, gaddr;
+ struct gmap *gmap;
+ pmd_t *pmdp;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+ spin_lock(&gmap->guest_table_lock);
+ entry = radix_tree_delete(&gmap->host_to_guest,
+ vmaddr >> PMD_SHIFT);
+ if (entry) {
+ pmdp = (pmd_t *)entry;
+ gaddr = __gmap_segment_gaddr(entry);
+ pmdp_notify_gmap(gmap, pmdp, gaddr);
+ WARN_ON(*entry & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |
+ _SEGMENT_ENTRY_GMAP_UC));
+ if (MACHINE_HAS_TLB_GUEST)
+ __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE,
+ gmap->asce, IDTE_LOCAL);
+ else if (MACHINE_HAS_IDTE)
+ __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_LOCAL);
+ *entry = _SEGMENT_ENTRY_EMPTY;
+ }
+ spin_unlock(&gmap->guest_table_lock);
+ }
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_idte_local);
+
+/**
+ * gmap_pmdp_idte_global - invalidate and clear a guest pmd entry
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr)
+{
+ unsigned long *entry, gaddr;
+ struct gmap *gmap;
+ pmd_t *pmdp;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+ spin_lock(&gmap->guest_table_lock);
+ entry = radix_tree_delete(&gmap->host_to_guest,
+ vmaddr >> PMD_SHIFT);
+ if (entry) {
+ pmdp = (pmd_t *)entry;
+ gaddr = __gmap_segment_gaddr(entry);
+ pmdp_notify_gmap(gmap, pmdp, gaddr);
+ WARN_ON(*entry & ~(_SEGMENT_ENTRY_HARDWARE_BITS_LARGE |
+ _SEGMENT_ENTRY_GMAP_UC));
+ if (MACHINE_HAS_TLB_GUEST)
+ __pmdp_idte(gaddr, pmdp, IDTE_GUEST_ASCE,
+ gmap->asce, IDTE_GLOBAL);
+ else if (MACHINE_HAS_IDTE)
+ __pmdp_idte(gaddr, pmdp, 0, 0, IDTE_GLOBAL);
+ else
+ __pmdp_csp(pmdp);
+ *entry = _SEGMENT_ENTRY_EMPTY;
+ }
+ spin_unlock(&gmap->guest_table_lock);
+ }
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_idte_global);
+
+/**
+ * gmap_test_and_clear_dirty_pmd - test and reset segment dirty status
+ * @gmap: pointer to guest address space
+ * @pmdp: pointer to the pmd to be tested
+ * @gaddr: virtual address in the guest address space
+ *
+ * This function is assumed to be called with the guest_table_lock
+ * held.
+ */
+bool gmap_test_and_clear_dirty_pmd(struct gmap *gmap, pmd_t *pmdp,
+ unsigned long gaddr)
+{
+ if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
+ return false;
+
+ /* Already protected memory, which did not change is clean */
+ if (pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT &&
+ !(pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_UC))
+ return false;
+
+ /* Clear UC indication and reset protection */
+ pmd_val(*pmdp) &= ~_SEGMENT_ENTRY_GMAP_UC;
+ gmap_protect_pmd(gmap, gaddr, pmdp, PROT_READ, 0);
+ return true;
+}
+
+/**
+ * gmap_sync_dirty_log_pmd - set bitmap based on dirty status of segment
+ * @gmap: pointer to guest address space
+ * @bitmap: dirty bitmap for this pmd
+ * @gaddr: virtual address in the guest address space
+ * @vmaddr: virtual address in the host address space
+ *
+ * This function is assumed to be called with the guest_table_lock
+ * held.
+ */
+void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4],
+ unsigned long gaddr, unsigned long vmaddr)
+{
+ int i;
+ pmd_t *pmdp;
+ pte_t *ptep;
+ spinlock_t *ptl;
+
+ pmdp = gmap_pmd_op_walk(gmap, gaddr);
+ if (!pmdp)
+ return;
+
+ if (pmd_large(*pmdp)) {
+ if (gmap_test_and_clear_dirty_pmd(gmap, pmdp, gaddr))
+ bitmap_fill(bitmap, _PAGE_ENTRIES);
+ } else {
+ for (i = 0; i < _PAGE_ENTRIES; i++, vmaddr += PAGE_SIZE) {
+ ptep = pte_alloc_map_lock(gmap->mm, pmdp, vmaddr, &ptl);
+ if (!ptep)
+ continue;
+ if (ptep_test_and_clear_uc(gmap->mm, vmaddr, ptep))
+ set_bit(i, bitmap);
+ spin_unlock(ptl);
+ }
+ }
+ gmap_pmd_op_end(gmap, pmdp);
+}
+EXPORT_SYMBOL_GPL(gmap_sync_dirty_log_pmd);
+
static inline void thp_split_mm(struct mm_struct *mm)
{
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -2168,17 +2548,45 @@
* Enable storage key handling from now on and initialize the storage
* keys with the default key.
*/
-static int __s390_enable_skey(pte_t *pte, unsigned long addr,
- unsigned long next, struct mm_walk *walk)
+static int __s390_enable_skey_pte(pte_t *pte, unsigned long addr,
+ unsigned long next, struct mm_walk *walk)
{
/* Clear storage key */
ptep_zap_key(walk->mm, addr, pte);
return 0;
}
+static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
+ unsigned long hmask, unsigned long next,
+ struct mm_walk *walk)
+{
+ pmd_t *pmd = (pmd_t *)pte;
+ unsigned long start, end;
+ struct page *page = pmd_page(*pmd);
+
+ /*
+ * The write check makes sure we do not set a key on shared
+ * memory. This is needed as the walker does not differentiate
+ * between actual guest memory and the process executable or
+ * shared libraries.
+ */
+ if (pmd_val(*pmd) & _SEGMENT_ENTRY_INVALID ||
+ !(pmd_val(*pmd) & _SEGMENT_ENTRY_WRITE))
+ return 0;
+
+ start = pmd_val(*pmd) & HPAGE_MASK;
+ end = start + HPAGE_SIZE - 1;
+ __storage_key_init_range(start, end);
+ set_bit(PG_arch_1, &page->flags);
+ return 0;
+}
+
int s390_enable_skey(void)
{
- struct mm_walk walk = { .pte_entry = __s390_enable_skey };
+ struct mm_walk walk = {
+ .hugetlb_entry = __s390_enable_skey_hugetlb,
+ .pte_entry = __s390_enable_skey_pte,
+ };
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int rc = 0;
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index e804090..b0246c7 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -123,6 +123,29 @@
return pte;
}
+static void clear_huge_pte_skeys(struct mm_struct *mm, unsigned long rste)
+{
+ struct page *page;
+ unsigned long size, paddr;
+
+ if (!mm_uses_skeys(mm) ||
+ rste & _SEGMENT_ENTRY_INVALID)
+ return;
+
+ if ((rste & _REGION_ENTRY_TYPE_MASK) == _REGION_ENTRY_TYPE_R3) {
+ page = pud_page(__pud(rste));
+ size = PUD_SIZE;
+ paddr = rste & PUD_MASK;
+ } else {
+ page = pmd_page(__pmd(rste));
+ size = PMD_SIZE;
+ paddr = rste & PMD_MASK;
+ }
+
+ if (!test_and_set_bit(PG_arch_1, &page->flags))
+ __storage_key_init_range(paddr, paddr + size - 1);
+}
+
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
@@ -137,6 +160,7 @@
rste |= _REGION_ENTRY_TYPE_R3 | _REGION3_ENTRY_LARGE;
else
rste |= _SEGMENT_ENTRY_LARGE;
+ clear_huge_pte_skeys(mm, rste);
pte_val(*ptep) = rste;
}
diff --git a/arch/s390/mm/page-states.c b/arch/s390/mm/page-states.c
index 382153f..dc3cede 100644
--- a/arch/s390/mm/page-states.c
+++ b/arch/s390/mm/page-states.c
@@ -271,7 +271,7 @@
list_for_each(l, &zone->free_area[order].free_list[t]) {
page = list_entry(l, struct page, lru);
if (make_stable)
- set_page_stable_dat(page, 0);
+ set_page_stable_dat(page, order);
else
set_page_unused(page, order);
}
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index c441715..f8c6faa 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -14,7 +14,7 @@
static inline unsigned long sske_frame(unsigned long addr, unsigned char skey)
{
- asm volatile(".insn rrf,0xb22b0000,%[skey],%[addr],9,0"
+ asm volatile(".insn rrf,0xb22b0000,%[skey],%[addr],1,0"
: [addr] "+a" (addr) : [skey] "d" (skey));
return addr;
}
@@ -23,8 +23,6 @@
{
unsigned long boundary, size;
- if (!PAGE_DEFAULT_KEY)
- return;
while (start < end) {
if (MACHINE_HAS_EDAT1) {
/* set storage keys for a 1MB frame */
@@ -37,7 +35,7 @@
continue;
}
}
- page_set_storage_key(start, PAGE_DEFAULT_KEY, 0);
+ page_set_storage_key(start, PAGE_DEFAULT_KEY, 1);
start += PAGE_SIZE;
}
}
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index e3bd562..76d89ee 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -28,7 +28,7 @@
.data = &page_table_allocate_pgste,
.maxlen = sizeof(int),
.mode = S_IRUGO | S_IWUSR,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
.extra1 = &page_table_allocate_pgste_min,
.extra2 = &page_table_allocate_pgste_max,
},
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 301e466..f2cc7da 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -347,18 +347,27 @@
mm->context.asce, IDTE_LOCAL);
else
__pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL);
+ if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m)
+ gmap_pmdp_idte_local(mm, addr);
}
static inline void pmdp_idte_global(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
{
- if (MACHINE_HAS_TLB_GUEST)
+ if (MACHINE_HAS_TLB_GUEST) {
__pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE,
mm->context.asce, IDTE_GLOBAL);
- else if (MACHINE_HAS_IDTE)
+ if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m)
+ gmap_pmdp_idte_global(mm, addr);
+ } else if (MACHINE_HAS_IDTE) {
__pmdp_idte(addr, pmdp, 0, 0, IDTE_GLOBAL);
- else
+ if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m)
+ gmap_pmdp_idte_global(mm, addr);
+ } else {
__pmdp_csp(pmdp);
+ if (mm_has_pgste(mm) && mm->context.allow_gmap_hpage_1m)
+ gmap_pmdp_csp(mm, addr);
+ }
}
static inline pmd_t pmdp_flush_direct(struct mm_struct *mm,
@@ -392,6 +401,8 @@
cpumask_of(smp_processor_id()))) {
pmd_val(*pmdp) |= _SEGMENT_ENTRY_INVALID;
mm->context.flush_mm = 1;
+ if (mm_has_pgste(mm))
+ gmap_pmdp_invalidate(mm, addr);
} else {
pmdp_idte_global(mm, addr, pmdp);
}
@@ -399,6 +410,24 @@
return old;
}
+static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd = pgd_offset(mm, addr);
+ p4d = p4d_alloc(mm, pgd, addr);
+ if (!p4d)
+ return NULL;
+ pud = pud_alloc(mm, p4d, addr);
+ if (!pud)
+ return NULL;
+ pmd = pmd_alloc(mm, pud, addr);
+ return pmd;
+}
+
pmd_t pmdp_xchg_direct(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t new)
{
@@ -693,40 +722,14 @@
/*
* Test and reset if a guest page is dirty
*/
-bool test_and_clear_guest_dirty(struct mm_struct *mm, unsigned long addr)
+bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep)
{
- spinlock_t *ptl;
- pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
pgste_t pgste;
- pte_t *ptep;
pte_t pte;
bool dirty;
int nodat;
- pgd = pgd_offset(mm, addr);
- p4d = p4d_alloc(mm, pgd, addr);
- if (!p4d)
- return false;
- pud = pud_alloc(mm, p4d, addr);
- if (!pud)
- return false;
- pmd = pmd_alloc(mm, pud, addr);
- if (!pmd)
- return false;
- /* We can't run guests backed by huge pages, but userspace can
- * still set them up and then try to migrate them without any
- * migration support.
- */
- if (pmd_large(*pmd))
- return true;
-
- ptep = pte_alloc_map_lock(mm, pmd, addr, &ptl);
- if (unlikely(!ptep))
- return false;
-
pgste = pgste_get_lock(ptep);
dirty = !!(pgste_val(pgste) & PGSTE_UC_BIT);
pgste_val(pgste) &= ~PGSTE_UC_BIT;
@@ -742,21 +745,43 @@
*ptep = pte;
}
pgste_set_unlock(ptep, pgste);
-
- spin_unlock(ptl);
return dirty;
}
-EXPORT_SYMBOL_GPL(test_and_clear_guest_dirty);
+EXPORT_SYMBOL_GPL(ptep_test_and_clear_uc);
int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
unsigned char key, bool nq)
{
- unsigned long keyul;
+ unsigned long keyul, paddr;
spinlock_t *ptl;
pgste_t old, new;
+ pmd_t *pmdp;
pte_t *ptep;
- ptep = get_locked_pte(mm, addr, &ptl);
+ pmdp = pmd_alloc_map(mm, addr);
+ if (unlikely(!pmdp))
+ return -EFAULT;
+
+ ptl = pmd_lock(mm, pmdp);
+ if (!pmd_present(*pmdp)) {
+ spin_unlock(ptl);
+ return -EFAULT;
+ }
+
+ if (pmd_large(*pmdp)) {
+ paddr = pmd_val(*pmdp) & HPAGE_MASK;
+ paddr |= addr & ~HPAGE_MASK;
+ /*
+ * Huge pmds need quiescing operations, they are
+ * always mapped.
+ */
+ page_set_storage_key(paddr, key, 1);
+ spin_unlock(ptl);
+ return 0;
+ }
+ spin_unlock(ptl);
+
+ ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl);
if (unlikely(!ptep))
return -EFAULT;
@@ -767,14 +792,14 @@
pgste_val(new) |= (keyul & (_PAGE_CHANGED | _PAGE_REFERENCED)) << 48;
pgste_val(new) |= (keyul & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 56;
if (!(pte_val(*ptep) & _PAGE_INVALID)) {
- unsigned long address, bits, skey;
+ unsigned long bits, skey;
- address = pte_val(*ptep) & PAGE_MASK;
- skey = (unsigned long) page_get_storage_key(address);
+ paddr = pte_val(*ptep) & PAGE_MASK;
+ skey = (unsigned long) page_get_storage_key(paddr);
bits = skey & (_PAGE_CHANGED | _PAGE_REFERENCED);
skey = key & (_PAGE_ACC_BITS | _PAGE_FP_BIT);
/* Set storage key ACC and FP */
- page_set_storage_key(address, skey, !nq);
+ page_set_storage_key(paddr, skey, !nq);
/* Merge host changed & referenced into pgste */
pgste_val(new) |= bits << 52;
}
@@ -830,11 +855,32 @@
int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
{
spinlock_t *ptl;
+ unsigned long paddr;
pgste_t old, new;
+ pmd_t *pmdp;
pte_t *ptep;
int cc = 0;
- ptep = get_locked_pte(mm, addr, &ptl);
+ pmdp = pmd_alloc_map(mm, addr);
+ if (unlikely(!pmdp))
+ return -EFAULT;
+
+ ptl = pmd_lock(mm, pmdp);
+ if (!pmd_present(*pmdp)) {
+ spin_unlock(ptl);
+ return -EFAULT;
+ }
+
+ if (pmd_large(*pmdp)) {
+ paddr = pmd_val(*pmdp) & HPAGE_MASK;
+ paddr |= addr & ~HPAGE_MASK;
+ cc = page_reset_referenced(paddr);
+ spin_unlock(ptl);
+ return cc;
+ }
+ spin_unlock(ptl);
+
+ ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl);
if (unlikely(!ptep))
return -EFAULT;
@@ -843,7 +889,8 @@
pgste_val(new) &= ~PGSTE_GR_BIT;
if (!(pte_val(*ptep) & _PAGE_INVALID)) {
- cc = page_reset_referenced(pte_val(*ptep) & PAGE_MASK);
+ paddr = pte_val(*ptep) & PAGE_MASK;
+ cc = page_reset_referenced(paddr);
/* Merge real referenced bit into host-set */
pgste_val(new) |= ((unsigned long) cc << 53) & PGSTE_HR_BIT;
}
@@ -862,18 +909,42 @@
int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
unsigned char *key)
{
+ unsigned long paddr;
spinlock_t *ptl;
pgste_t pgste;
+ pmd_t *pmdp;
pte_t *ptep;
- ptep = get_locked_pte(mm, addr, &ptl);
+ pmdp = pmd_alloc_map(mm, addr);
+ if (unlikely(!pmdp))
+ return -EFAULT;
+
+ ptl = pmd_lock(mm, pmdp);
+ if (!pmd_present(*pmdp)) {
+ /* Not yet mapped memory has a zero key */
+ spin_unlock(ptl);
+ *key = 0;
+ return 0;
+ }
+
+ if (pmd_large(*pmdp)) {
+ paddr = pmd_val(*pmdp) & HPAGE_MASK;
+ paddr |= addr & ~HPAGE_MASK;
+ *key = page_get_storage_key(paddr);
+ spin_unlock(ptl);
+ return 0;
+ }
+ spin_unlock(ptl);
+
+ ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl);
if (unlikely(!ptep))
return -EFAULT;
pgste = pgste_get_lock(ptep);
*key = (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
+ paddr = pte_val(*ptep) & PAGE_MASK;
if (!(pte_val(*ptep) & _PAGE_INVALID))
- *key = page_get_storage_key(pte_val(*ptep) & PAGE_MASK);
+ *key = page_get_storage_key(paddr);
/* Reflect guest's logical view, not physical */
*key |= (pgste_val(pgste) & (PGSTE_GR_BIT | PGSTE_GC_BIT)) >> 48;
pgste_set_unlock(ptep, pgste);
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 5f0234e..d7052cb 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -485,8 +485,6 @@
/* br %r1 */
_EMIT2(0x07f1);
} else {
- /* larl %r1,.+14 */
- EMIT6_PCREL_RILB(0xc0000000, REG_1, jit->prg + 14);
/* ex 0,S390_lowcore.br_r1_tampoline */
EMIT4_DISP(0x44000000, REG_0, REG_0,
offsetof(struct lowcore, br_r1_trampoline));
diff --git a/arch/s390/numa/numa.c b/arch/s390/numa/numa.c
index 06a8043..5bd3744 100644
--- a/arch/s390/numa/numa.c
+++ b/arch/s390/numa/numa.c
@@ -134,6 +134,8 @@
{
pr_info("NUMA mode: %s\n", mode->name);
nodes_clear(node_possible_map);
+ /* Initially attach all possible CPUs to node 0. */
+ cpumask_copy(&node_to_cpumask_map[0], cpu_possible_mask);
if (mode->setup)
mode->setup();
numa_setup_memory();
@@ -141,20 +143,6 @@
}
/*
- * numa_init_early() - Initialization initcall
- *
- * This runs when only one CPU is online and before the first
- * topology update is called for by the scheduler.
- */
-static int __init numa_init_early(void)
-{
- /* Attach all possible CPUs to node 0 for now. */
- cpumask_copy(&node_to_cpumask_map[0], cpu_possible_mask);
- return 0;
-}
-early_initcall(numa_init_early);
-
-/*
* numa_init_late() - Initialization initcall
*
* Register NUMA nodes.
diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c
index b482e95..57f7cda 100644
--- a/arch/s390/pci/pci_debug.c
+++ b/arch/s390/pci/pci_debug.c
@@ -48,6 +48,10 @@
"Maximum work units",
};
+static char *pci_fmt3_names[] = {
+ "Transmitted bytes",
+};
+
static char *pci_sw_names[] = {
"Allocated pages",
"Mapped pages",
@@ -112,6 +116,10 @@
pci_fmb_show(m, pci_fmt2_names, ARRAY_SIZE(pci_fmt2_names),
&zdev->fmb->fmt2.consumed_work_units);
break;
+ case 3:
+ pci_fmb_show(m, pci_fmt3_names, ARRAY_SIZE(pci_fmt3_names),
+ &zdev->fmb->fmt3.tx_bytes);
+ break;
default:
seq_puts(m, "Unknown format\n");
}
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index 1ace023..8d61218 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -7,13 +7,13 @@
targets += $(purgatory-y) purgatory.ro kexec-purgatory.c
PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
-$(obj)/sha256.o: $(srctree)/lib/sha256.c
+$(obj)/sha256.o: $(srctree)/lib/sha256.c FORCE
$(call if_changed_rule,cc_o_c)
-$(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S
+$(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
$(call if_changed_rule,as_o_S)
-$(obj)/string.o: $(srctree)/arch/s390/lib/string.c
+$(obj)/string.o: $(srctree)/arch/s390/lib/string.c FORCE
$(call if_changed_rule,cc_o_c)
LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib
@@ -21,8 +21,9 @@
KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes
KBUILD_CFLAGS += -Wno-pointer-sign -Wno-sign-compare
KBUILD_CFLAGS += -fno-zero-initialized-in-bss -fno-builtin -ffreestanding
-KBUILD_CFLAGS += -c -MD -Os -m64 -msoft-float
+KBUILD_CFLAGS += -c -MD -Os -m64 -msoft-float -fno-common
KBUILD_CFLAGS += $(call cc-option,-fno-PIE)
+KBUILD_AFLAGS := $(filter-out -DCC_USING_EXPOLINE,$(KBUILD_AFLAGS))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/s390/purgatory/head.S b/arch/s390/purgatory/head.S
index 660c96a..2e3707b 100644
--- a/arch/s390/purgatory/head.S
+++ b/arch/s390/purgatory/head.S
@@ -243,33 +243,26 @@
.quad 0
.endr
-purgatory_sha256_digest:
- .global purgatory_sha256_digest
- .rept 32 /* SHA256_DIGEST_SIZE */
- .byte 0
- .endr
+/* Macro to define a global variable with name and size (in bytes) to be
+ * shared with C code.
+ *
+ * Add the .size and .type attribute to satisfy checks on the Elf_Sym during
+ * purgatory load.
+ */
+.macro GLOBAL_VARIABLE name,size
+\name:
+ .global \name
+ .size \name,\size
+ .type \name,object
+ .skip \size,0
+.endm
-purgatory_sha_regions:
- .global purgatory_sha_regions
- .rept 16 * __KEXEC_SHA_REGION_SIZE /* KEXEC_SEGMENTS_MAX */
- .byte 0
- .endr
-
-kernel_entry:
- .global kernel_entry
- .quad 0
-
-kernel_type:
- .global kernel_type
- .quad 0
-
-crash_start:
- .global crash_start
- .quad 0
-
-crash_size:
- .global crash_size
- .quad 0
+GLOBAL_VARIABLE purgatory_sha256_digest,32
+GLOBAL_VARIABLE purgatory_sha_regions,16*__KEXEC_SHA_REGION_SIZE
+GLOBAL_VARIABLE kernel_entry,8
+GLOBAL_VARIABLE kernel_type,8
+GLOBAL_VARIABLE crash_start,8
+GLOBAL_VARIABLE crash_size,8
.align PAGE_SIZE
stack:
diff --git a/arch/s390/purgatory/purgatory.c b/arch/s390/purgatory/purgatory.c
index 4e2beb3..3528e6d 100644
--- a/arch/s390/purgatory/purgatory.c
+++ b/arch/s390/purgatory/purgatory.c
@@ -12,15 +12,6 @@
#include <linux/string.h>
#include <asm/purgatory.h>
-struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX];
-u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE];
-
-u64 kernel_entry;
-u64 kernel_type;
-
-u64 crash_start;
-u64 crash_size;
-
int verify_sha256_digest(void)
{
struct kexec_sha_region *ptr, *end;
diff --git a/arch/s390/scripts/Makefile.chkbss b/arch/s390/scripts/Makefile.chkbss
index d92f2d9..9bba2c1 100644
--- a/arch/s390/scripts/Makefile.chkbss
+++ b/arch/s390/scripts/Makefile.chkbss
@@ -2,13 +2,22 @@
quiet_cmd_chkbss = CHKBSS $<
define cmd_chkbss
+ rm -f $@; \
if ! $(OBJDUMP) -j .bss -w -h $< | awk 'END { if ($$3) exit 1 }'; then \
echo "error: $< .bss section is not empty" >&2; exit 1; \
fi; \
touch $@;
endef
-$(obj)/built-in.a: $(patsubst %, $(obj)/%.chkbss, $(chkbss))
+chkbss-target ?= $(obj)/built-in.a
+ifneq (,$(findstring /,$(chkbss)))
+chkbss-files := $(patsubst %, %.chkbss, $(chkbss))
+else
+chkbss-files := $(patsubst %, $(obj)/%.chkbss, $(chkbss))
+endif
+
+$(chkbss-target): $(chkbss-files)
+targets += $(notdir $(chkbss-files))
%.o.chkbss: %.o
$(call cmd,chkbss)
diff --git a/arch/s390/tools/gen_opcode_table.c b/arch/s390/tools/gen_opcode_table.c
index 259aa06..a1bc02b 100644
--- a/arch/s390/tools/gen_opcode_table.c
+++ b/arch/s390/tools/gen_opcode_table.c
@@ -257,7 +257,7 @@
if (!desc->group)
exit(EXIT_FAILURE);
group = &desc->group[desc->nr_groups - 1];
- strncpy(group->opcode, insn->opcode, 2);
+ memcpy(group->opcode, insn->opcode, 2);
group->type = insn->type;
group->offset = offset;
group->count = 1;
@@ -283,7 +283,7 @@
continue;
add_to_group(desc, insn, offset);
if (strncmp(opcode, insn->opcode, 2)) {
- strncpy(opcode, insn->opcode, 2);
+ memcpy(opcode, insn->opcode, 2);
printf("\t/* %.2s */ \\\n", opcode);
}
print_opcode(insn, offset);
diff --git a/arch/sh/include/asm/atomic.h b/arch/sh/include/asm/atomic.h
index 0fd0099..f37b95a8 100644
--- a/arch/sh/include/asm/atomic.h
+++ b/arch/sh/include/asm/atomic.h
@@ -32,44 +32,9 @@
#include <asm/atomic-irq.h>
#endif
-#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-#define atomic_sub_and_test(i,v) (atomic_sub_return((i), (v)) == 0)
-#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
-
-#define atomic_inc(v) atomic_add(1, (v))
-#define atomic_dec(v) atomic_sub(1, (v))
-
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
-/**
- * __atomic_add_unless - add unless the number is a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
-
- return c;
-}
-
#endif /* CONFIG_CPU_J2 */
#endif /* __ASM_SH_ATOMIC_H */
diff --git a/arch/sh/include/asm/cmpxchg-xchg.h b/arch/sh/include/asm/cmpxchg-xchg.h
index 1e881f5..593a970 100644
--- a/arch/sh/include/asm/cmpxchg-xchg.h
+++ b/arch/sh/include/asm/cmpxchg-xchg.h
@@ -8,7 +8,8 @@
* This work is licensed under the terms of the GNU GPL, version 2. See the
* file "COPYING" in the main directory of this archive for more details.
*/
-#include <linux/bitops.h>
+#include <linux/bits.h>
+#include <linux/compiler.h>
#include <asm/byteorder.h>
/*
diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_breakpoint.h
index 7431c17..199d17b 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -10,7 +10,6 @@
#include <linux/types.h>
struct arch_hw_breakpoint {
- char *name; /* Contains name of the symbol to set bkpt */
unsigned long address;
u16 len;
u16 type;
@@ -41,6 +40,7 @@
struct clk *clk; /* optional interface clock / MSTP bit */
};
+struct perf_event_attr;
struct perf_event;
struct task_struct;
struct pmu;
@@ -54,8 +54,10 @@
}
/* arch/sh/kernel/hw_breakpoint.c */
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw);
extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
diff --git a/arch/sh/include/asm/kprobes.h b/arch/sh/include/asm/kprobes.h
index 85d8bca..6171682 100644
--- a/arch/sh/include/asm/kprobes.h
+++ b/arch/sh/include/asm/kprobes.h
@@ -27,7 +27,6 @@
void arch_remove_kprobe(struct kprobe *);
void kretprobe_trampoline(void);
-void jprobe_return_end(void);
/* Architecture specific copy of original instruction*/
struct arch_specific_insn {
@@ -43,9 +42,6 @@
/* per-cpu kprobe control block */
struct kprobe_ctlblk {
unsigned long kprobe_status;
- unsigned long jprobe_saved_r15;
- struct pt_regs jprobe_saved_regs;
- kprobe_opcode_t jprobes_stack[MAX_STACK_SIZE];
struct prev_kprobe prev_kprobe;
};
diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c
index 8648ed0..d9ff3b4 100644
--- a/arch/sh/kernel/hw_breakpoint.c
+++ b/arch/sh/kernel/hw_breakpoint.c
@@ -124,14 +124,13 @@
/*
* Check for virtual address in kernel space.
*/
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
{
unsigned int len;
unsigned long va;
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
- va = info->address;
- len = get_hbp_len(info->len);
+ va = hw->address;
+ len = get_hbp_len(hw->len);
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
}
@@ -174,40 +173,40 @@
return 0;
}
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
- info->address = bp->attr.bp_addr;
+ hw->address = attr->bp_addr;
/* Len */
- switch (bp->attr.bp_len) {
+ switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
- info->len = SH_BREAKPOINT_LEN_1;
+ hw->len = SH_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
- info->len = SH_BREAKPOINT_LEN_2;
+ hw->len = SH_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
- info->len = SH_BREAKPOINT_LEN_4;
+ hw->len = SH_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_8:
- info->len = SH_BREAKPOINT_LEN_8;
+ hw->len = SH_BREAKPOINT_LEN_8;
break;
default:
return -EINVAL;
}
/* Type */
- switch (bp->attr.bp_type) {
+ switch (attr->bp_type) {
case HW_BREAKPOINT_R:
- info->type = SH_BREAKPOINT_READ;
+ hw->type = SH_BREAKPOINT_READ;
break;
case HW_BREAKPOINT_W:
- info->type = SH_BREAKPOINT_WRITE;
+ hw->type = SH_BREAKPOINT_WRITE;
break;
case HW_BREAKPOINT_W | HW_BREAKPOINT_R:
- info->type = SH_BREAKPOINT_RW;
+ hw->type = SH_BREAKPOINT_RW;
break;
default:
return -EINVAL;
@@ -219,19 +218,20 @@
/*
* Validate the arch-specific HW Breakpoint register settings
*/
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
unsigned int align;
int ret;
- ret = arch_build_bp_info(bp);
+ ret = arch_build_bp_info(bp, attr, hw);
if (ret)
return ret;
ret = -EINVAL;
- switch (info->len) {
+ switch (hw->len) {
case SH_BREAKPOINT_LEN_1:
align = 0;
break;
@@ -249,17 +249,10 @@
}
/*
- * For kernel-addresses, either the address or symbol name can be
- * specified.
- */
- if (info->name)
- info->address = (unsigned long)kallsyms_lookup_name(info->name);
-
- /*
* Check that the low-order bits of the address are appropriate
* for the alignment implied by len.
*/
- if (info->address & align)
+ if (hw->address & align)
return -EINVAL;
return 0;
@@ -346,7 +339,7 @@
perf_bp_event(bp, args->regs);
/* Deliver the signal to userspace */
- if (!arch_check_bp_in_kernelspace(bp)) {
+ if (!arch_check_bp_in_kernelspace(&bp->hw.info)) {
force_sig_fault(SIGTRAP, TRAP_HWBKPT,
(void __user *)NULL, current);
}
diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c
index 52a5e11..241e903 100644
--- a/arch/sh/kernel/kprobes.c
+++ b/arch/sh/kernel/kprobes.c
@@ -248,11 +248,6 @@
prepare_singlestep(p, regs);
kcb->kprobe_status = KPROBE_REENTER;
return 1;
- } else {
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- goto ss_probe;
- }
}
goto no_kprobe;
}
@@ -277,11 +272,13 @@
set_current_kprobe(p, regs, kcb);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- if (p->pre_handler && p->pre_handler(p, regs))
+ if (p->pre_handler && p->pre_handler(p, regs)) {
/* handler has already set things up, so skip ss setup */
+ reset_current_kprobe();
+ preempt_enable_no_resched();
return 1;
+ }
-ss_probe:
prepare_singlestep(p, regs);
kcb->kprobe_status = KPROBE_HIT_SS;
return 1;
@@ -358,8 +355,6 @@
regs->pc = orig_ret_address;
kretprobe_hash_unlock(current, &flags);
- preempt_enable_no_resched();
-
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
@@ -508,14 +503,8 @@
if (post_kprobe_handler(args->regs))
ret = NOTIFY_STOP;
} else {
- if (kprobe_handler(args->regs)) {
+ if (kprobe_handler(args->regs))
ret = NOTIFY_STOP;
- } else {
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler &&
- p->break_handler(p, args->regs))
- ret = NOTIFY_STOP;
- }
}
}
}
@@ -523,57 +512,6 @@
return ret;
}
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- unsigned long addr;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- kcb->jprobe_saved_regs = *regs;
- kcb->jprobe_saved_r15 = regs->regs[15];
- addr = kcb->jprobe_saved_r15;
-
- /*
- * TBD: As Linus pointed out, gcc assumes that the callee
- * owns the argument space and could overwrite it, e.g.
- * tailcall optimization. So, to be absolutely safe
- * we also save and restore enough stack bytes to cover
- * the argument area.
- */
- memcpy(kcb->jprobes_stack, (kprobe_opcode_t *) addr,
- MIN_STACK_SIZE(addr));
-
- regs->pc = (unsigned long)(jp->entry);
-
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- asm volatile ("trapa #0x3a\n\t" "jprobe_return_end:\n\t" "nop\n\t");
-}
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- unsigned long stack_addr = kcb->jprobe_saved_r15;
- u8 *addr = (u8 *)regs->pc;
-
- if ((addr >= (u8 *)jprobe_return) &&
- (addr <= (u8 *)jprobe_return_end)) {
- *regs = kcb->jprobe_saved_regs;
-
- memcpy((kprobe_opcode_t *)stack_addr, kcb->jprobes_stack,
- MIN_STACK_SIZE(stack_addr));
-
- kcb->kprobe_status = KPROBE_HIT_SS;
- preempt_enable_no_resched();
- return 1;
- }
-
- return 0;
-}
-
static struct kprobe trampoline_p = {
.addr = (kprobe_opcode_t *)&kretprobe_trampoline,
.pre_handler = trampoline_probe_handler
diff --git a/arch/sparc/include/asm/atomic_32.h b/arch/sparc/include/asm/atomic_32.h
index d13ce51..94c930f 100644
--- a/arch/sparc/include/asm/atomic_32.h
+++ b/arch/sparc/include/asm/atomic_32.h
@@ -27,17 +27,17 @@
int atomic_fetch_xor(int, atomic_t *);
int atomic_cmpxchg(atomic_t *, int, int);
int atomic_xchg(atomic_t *, int);
-int __atomic_add_unless(atomic_t *, int, int);
+int atomic_fetch_add_unless(atomic_t *, int, int);
void atomic_set(atomic_t *, int);
+#define atomic_fetch_add_unless atomic_fetch_add_unless
+
#define atomic_set_release(v, i) atomic_set((v), (i))
#define atomic_read(v) READ_ONCE((v)->counter)
#define atomic_add(i, v) ((void)atomic_add_return( (int)(i), (v)))
#define atomic_sub(i, v) ((void)atomic_add_return(-(int)(i), (v)))
-#define atomic_inc(v) ((void)atomic_add_return( 1, (v)))
-#define atomic_dec(v) ((void)atomic_add_return( -1, (v)))
#define atomic_and(i, v) ((void)atomic_fetch_and((i), (v)))
#define atomic_or(i, v) ((void)atomic_fetch_or((i), (v)))
@@ -46,22 +46,4 @@
#define atomic_sub_return(i, v) (atomic_add_return(-(int)(i), (v)))
#define atomic_fetch_sub(i, v) (atomic_fetch_add (-(int)(i), (v)))
-#define atomic_inc_return(v) (atomic_add_return( 1, (v)))
-#define atomic_dec_return(v) (atomic_add_return( -1, (v)))
-
-#define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0)
-
-/*
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-
-#define atomic_dec_and_test(v) (atomic_dec_return(v) == 0)
-#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
-
#endif /* !(__ARCH_SPARC_ATOMIC__) */
diff --git a/arch/sparc/include/asm/atomic_64.h b/arch/sparc/include/asm/atomic_64.h
index 28db058..6963482 100644
--- a/arch/sparc/include/asm/atomic_64.h
+++ b/arch/sparc/include/asm/atomic_64.h
@@ -50,38 +50,6 @@
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
-#define atomic_dec_return(v) atomic_sub_return(1, v)
-#define atomic64_dec_return(v) atomic64_sub_return(1, v)
-
-#define atomic_inc_return(v) atomic_add_return(1, v)
-#define atomic64_inc_return(v) atomic64_add_return(1, v)
-
-/*
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-
-#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
-#define atomic64_sub_and_test(i, v) (atomic64_sub_return(i, v) == 0)
-
-#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
-#define atomic64_dec_and_test(v) (atomic64_sub_return(1, v) == 0)
-
-#define atomic_inc(v) atomic_add(1, v)
-#define atomic64_inc(v) atomic64_add(1, v)
-
-#define atomic_dec(v) atomic_sub(1, v)
-#define atomic64_dec(v) atomic64_sub(1, v)
-
-#define atomic_add_negative(i, v) (atomic_add_return(i, v) < 0)
-#define atomic64_add_negative(i, v) (atomic64_add_return(i, v) < 0)
-
#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
static inline int atomic_xchg(atomic_t *v, int new)
@@ -89,42 +57,11 @@
return xchg(&v->counter, new);
}
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
-
#define atomic64_cmpxchg(v, o, n) \
((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
-static inline long atomic64_add_unless(atomic64_t *v, long a, long u)
-{
- long c, old;
- c = atomic64_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic64_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c != (u);
-}
-
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
-
long atomic64_dec_if_positive(atomic64_t *v);
+#define atomic64_dec_if_positive atomic64_dec_if_positive
#endif /* !(__ARCH_SPARC64_ATOMIC__) */
diff --git a/arch/sparc/include/asm/kprobes.h b/arch/sparc/include/asm/kprobes.h
index 3704490..bfcaa63 100644
--- a/arch/sparc/include/asm/kprobes.h
+++ b/arch/sparc/include/asm/kprobes.h
@@ -44,7 +44,6 @@
unsigned long kprobe_status;
unsigned long kprobe_orig_tnpc;
unsigned long kprobe_orig_tstate_pil;
- struct pt_regs jprobe_saved_regs;
struct prev_kprobe prev_kprobe;
};
diff --git a/arch/sparc/kernel/kprobes.c b/arch/sparc/kernel/kprobes.c
index ab4ba43..dfbca24 100644
--- a/arch/sparc/kernel/kprobes.c
+++ b/arch/sparc/kernel/kprobes.c
@@ -147,18 +147,12 @@
kcb->kprobe_status = KPROBE_REENTER;
prepare_singlestep(p, regs, kcb);
return 1;
- } else {
- if (*(u32 *)addr != BREAKPOINT_INSTRUCTION) {
+ } else if (*(u32 *)addr != BREAKPOINT_INSTRUCTION) {
/* The breakpoint instruction was removed by
* another cpu right after we hit, no further
* handling of this interrupt is appropriate
*/
- ret = 1;
- goto no_kprobe;
- }
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs))
- goto ss_probe;
+ ret = 1;
}
goto no_kprobe;
}
@@ -181,10 +175,12 @@
set_current_kprobe(p, regs, kcb);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- if (p->pre_handler && p->pre_handler(p, regs))
+ if (p->pre_handler && p->pre_handler(p, regs)) {
+ reset_current_kprobe();
+ preempt_enable_no_resched();
return 1;
+ }
-ss_probe:
prepare_singlestep(p, regs, kcb);
kcb->kprobe_status = KPROBE_HIT_SS;
return 1;
@@ -441,53 +437,6 @@
exception_exit(prev_state);
}
-/* Jprobes support. */
-int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- memcpy(&(kcb->jprobe_saved_regs), regs, sizeof(*regs));
-
- regs->tpc = (unsigned long) jp->entry;
- regs->tnpc = ((unsigned long) jp->entry) + 0x4UL;
- regs->tstate |= TSTATE_PIL;
-
- return 1;
-}
-
-void __kprobes jprobe_return(void)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- register unsigned long orig_fp asm("g1");
-
- orig_fp = kcb->jprobe_saved_regs.u_regs[UREG_FP];
- __asm__ __volatile__("\n"
-"1: cmp %%sp, %0\n\t"
- "blu,a,pt %%xcc, 1b\n\t"
- " restore\n\t"
- ".globl jprobe_return_trap_instruction\n"
-"jprobe_return_trap_instruction:\n\t"
- "ta 0x70"
- : /* no outputs */
- : "r" (orig_fp));
-}
-
-extern void jprobe_return_trap_instruction(void);
-
-int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- u32 *addr = (u32 *) regs->tpc;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- if (addr == (u32 *) jprobe_return_trap_instruction) {
- memcpy(regs, &(kcb->jprobe_saved_regs), sizeof(*regs));
- preempt_enable_no_resched();
- return 1;
- }
- return 0;
-}
-
/* The value stored in the return address register is actually 2
* instructions before where the callee will return to.
* Sequences usually look something like this
@@ -562,9 +511,7 @@
regs->tpc = orig_ret_address;
regs->tnpc = orig_ret_address + 4;
- reset_current_kprobe();
kretprobe_hash_unlock(current, &flags);
- preempt_enable_no_resched();
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
diff --git a/arch/sparc/lib/atomic32.c b/arch/sparc/lib/atomic32.c
index 465a901..281fa63 100644
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -95,7 +95,7 @@
}
EXPORT_SYMBOL(atomic_cmpxchg);
-int __atomic_add_unless(atomic_t *v, int a, int u)
+int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
int ret;
unsigned long flags;
@@ -107,7 +107,7 @@
spin_unlock_irqrestore(ATOMIC_HASH(v), flags);
return ret;
}
-EXPORT_SYMBOL(__atomic_add_unless);
+EXPORT_SYMBOL(atomic_fetch_add_unless);
/* Atomic operations are already serializing */
void atomic_set(atomic_t *v, int i)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 887d3a7..6d4774f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -180,7 +180,7 @@
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
- select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION
+ select HAVE_RELIABLE_STACKTRACE if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
select HAVE_STACK_VALIDATION if X86_64
select HAVE_RSEQ
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index a08e828..7e3c07d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -80,11 +80,6 @@
# alignment instructions.
KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align4))
- # Disable unit-at-a-time mode on pre-gcc-4.0 compilers, it makes gcc use
- # a lot more stack due to the lack of sharing of stacklots:
- KBUILD_CFLAGS += $(call cc-ifversion, -lt, 0400, \
- $(call cc-option,-fno-unit-at-a-time))
-
# CPU-specific tuning. Anything which can be shared with UML should go here.
include arch/x86/Makefile_32.cpu
KBUILD_CFLAGS += $(cflags-y)
diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h
index 0d41d68..2e13824 100644
--- a/arch/x86/boot/bitops.h
+++ b/arch/x86/boot/bitops.h
@@ -17,6 +17,7 @@
#define _LINUX_BITOPS_H /* Inhibit inclusion of <linux/bitops.h> */
#include <linux/types.h>
+#include <asm/asm.h>
static inline bool constant_test_bit(int nr, const void *addr)
{
@@ -28,7 +29,7 @@
bool v;
const u32 *p = (const u32 *)addr;
- asm("btl %2,%1; setc %0" : "=qm" (v) : "m" (*p), "Ir" (nr));
+ asm("btl %2,%1" CC_SET(c) : CC_OUT(c) (v) : "m" (*p), "Ir" (nr));
return v;
}
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index e98522e..1458b17 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -34,74 +34,13 @@
\
table = (typeof(table))sys_table; \
\
- c->runtime_services = table->runtime; \
- c->boot_services = table->boottime; \
- c->text_output = table->con_out; \
+ c->runtime_services = table->runtime; \
+ c->boot_services = table->boottime; \
+ c->text_output = table->con_out; \
}
BOOT_SERVICES(32);
BOOT_SERVICES(64);
-static inline efi_status_t __open_volume32(void *__image, void **__fh)
-{
- efi_file_io_interface_t *io;
- efi_loaded_image_32_t *image = __image;
- efi_file_handle_32_t *fh;
- efi_guid_t fs_proto = EFI_FILE_SYSTEM_GUID;
- efi_status_t status;
- void *handle = (void *)(unsigned long)image->device_handle;
- unsigned long func;
-
- status = efi_call_early(handle_protocol, handle,
- &fs_proto, (void **)&io);
- if (status != EFI_SUCCESS) {
- efi_printk(sys_table, "Failed to handle fs_proto\n");
- return status;
- }
-
- func = (unsigned long)io->open_volume;
- status = efi_early->call(func, io, &fh);
- if (status != EFI_SUCCESS)
- efi_printk(sys_table, "Failed to open volume\n");
-
- *__fh = fh;
- return status;
-}
-
-static inline efi_status_t __open_volume64(void *__image, void **__fh)
-{
- efi_file_io_interface_t *io;
- efi_loaded_image_64_t *image = __image;
- efi_file_handle_64_t *fh;
- efi_guid_t fs_proto = EFI_FILE_SYSTEM_GUID;
- efi_status_t status;
- void *handle = (void *)(unsigned long)image->device_handle;
- unsigned long func;
-
- status = efi_call_early(handle_protocol, handle,
- &fs_proto, (void **)&io);
- if (status != EFI_SUCCESS) {
- efi_printk(sys_table, "Failed to handle fs_proto\n");
- return status;
- }
-
- func = (unsigned long)io->open_volume;
- status = efi_early->call(func, io, &fh);
- if (status != EFI_SUCCESS)
- efi_printk(sys_table, "Failed to open volume\n");
-
- *__fh = fh;
- return status;
-}
-
-efi_status_t
-efi_open_volume(efi_system_table_t *sys_table, void *__image, void **__fh)
-{
- if (efi_early->is64)
- return __open_volume64(__image, __fh);
-
- return __open_volume32(__image, __fh);
-}
-
void efi_char16_printk(efi_system_table_t *table, efi_char16_t *str)
{
efi_call_proto(efi_simple_text_output_protocol, output_string,
@@ -109,7 +48,7 @@
}
static efi_status_t
-__setup_efi_pci(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
+preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
{
struct pci_setup_rom *rom = NULL;
efi_status_t status;
@@ -134,16 +73,16 @@
status = efi_call_early(allocate_pool, EFI_LOADER_DATA, size, &rom);
if (status != EFI_SUCCESS) {
- efi_printk(sys_table, "Failed to alloc mem for rom\n");
+ efi_printk(sys_table, "Failed to allocate memory for 'rom'\n");
return status;
}
memset(rom, 0, sizeof(*rom));
- rom->data.type = SETUP_PCI;
- rom->data.len = size - sizeof(struct setup_data);
- rom->data.next = 0;
- rom->pcilen = pci->romsize;
+ rom->data.type = SETUP_PCI;
+ rom->data.len = size - sizeof(struct setup_data);
+ rom->data.next = 0;
+ rom->pcilen = pci->romsize;
*__rom = rom;
status = efi_call_proto(efi_pci_io_protocol, pci.read, pci,
@@ -179,96 +118,6 @@
return status;
}
-static void
-setup_efi_pci32(struct boot_params *params, void **pci_handle,
- unsigned long size)
-{
- efi_pci_io_protocol_t *pci = NULL;
- efi_guid_t pci_proto = EFI_PCI_IO_PROTOCOL_GUID;
- u32 *handles = (u32 *)(unsigned long)pci_handle;
- efi_status_t status;
- unsigned long nr_pci;
- struct setup_data *data;
- int i;
-
- data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
-
- while (data && data->next)
- data = (struct setup_data *)(unsigned long)data->next;
-
- nr_pci = size / sizeof(u32);
- for (i = 0; i < nr_pci; i++) {
- struct pci_setup_rom *rom = NULL;
- u32 h = handles[i];
-
- status = efi_call_early(handle_protocol, h,
- &pci_proto, (void **)&pci);
-
- if (status != EFI_SUCCESS)
- continue;
-
- if (!pci)
- continue;
-
- status = __setup_efi_pci(pci, &rom);
- if (status != EFI_SUCCESS)
- continue;
-
- if (data)
- data->next = (unsigned long)rom;
- else
- params->hdr.setup_data = (unsigned long)rom;
-
- data = (struct setup_data *)rom;
-
- }
-}
-
-static void
-setup_efi_pci64(struct boot_params *params, void **pci_handle,
- unsigned long size)
-{
- efi_pci_io_protocol_t *pci = NULL;
- efi_guid_t pci_proto = EFI_PCI_IO_PROTOCOL_GUID;
- u64 *handles = (u64 *)(unsigned long)pci_handle;
- efi_status_t status;
- unsigned long nr_pci;
- struct setup_data *data;
- int i;
-
- data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
-
- while (data && data->next)
- data = (struct setup_data *)(unsigned long)data->next;
-
- nr_pci = size / sizeof(u64);
- for (i = 0; i < nr_pci; i++) {
- struct pci_setup_rom *rom = NULL;
- u64 h = handles[i];
-
- status = efi_call_early(handle_protocol, h,
- &pci_proto, (void **)&pci);
-
- if (status != EFI_SUCCESS)
- continue;
-
- if (!pci)
- continue;
-
- status = __setup_efi_pci(pci, &rom);
- if (status != EFI_SUCCESS)
- continue;
-
- if (data)
- data->next = (unsigned long)rom;
- else
- params->hdr.setup_data = (unsigned long)rom;
-
- data = (struct setup_data *)rom;
-
- }
-}
-
/*
* There's no way to return an informative status from this function,
* because any analysis (and printing of error messages) needs to be
@@ -284,6 +133,9 @@
void **pci_handle = NULL;
efi_guid_t pci_proto = EFI_PCI_IO_PROTOCOL_GUID;
unsigned long size = 0;
+ unsigned long nr_pci;
+ struct setup_data *data;
+ int i;
status = efi_call_early(locate_handle,
EFI_LOCATE_BY_PROTOCOL,
@@ -295,7 +147,7 @@
size, (void **)&pci_handle);
if (status != EFI_SUCCESS) {
- efi_printk(sys_table, "Failed to alloc mem for pci_handle\n");
+ efi_printk(sys_table, "Failed to allocate memory for 'pci_handle'\n");
return;
}
@@ -307,10 +159,34 @@
if (status != EFI_SUCCESS)
goto free_handle;
- if (efi_early->is64)
- setup_efi_pci64(params, pci_handle, size);
- else
- setup_efi_pci32(params, pci_handle, size);
+ data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
+
+ while (data && data->next)
+ data = (struct setup_data *)(unsigned long)data->next;
+
+ nr_pci = size / (efi_is_64bit() ? sizeof(u64) : sizeof(u32));
+ for (i = 0; i < nr_pci; i++) {
+ efi_pci_io_protocol_t *pci = NULL;
+ struct pci_setup_rom *rom;
+
+ status = efi_call_early(handle_protocol,
+ efi_is_64bit() ? ((u64 *)pci_handle)[i]
+ : ((u32 *)pci_handle)[i],
+ &pci_proto, (void **)&pci);
+ if (status != EFI_SUCCESS || !pci)
+ continue;
+
+ status = preserve_pci_rom_image(pci, &rom);
+ if (status != EFI_SUCCESS)
+ continue;
+
+ if (data)
+ data->next = (unsigned long)rom;
+ else
+ params->hdr.setup_data = (unsigned long)rom;
+
+ data = (struct setup_data *)rom;
+ }
free_handle:
efi_call_early(free_pool, pci_handle);
@@ -341,8 +217,7 @@
status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
size + sizeof(struct setup_data), &new);
if (status != EFI_SUCCESS) {
- efi_printk(sys_table,
- "Failed to alloc mem for properties\n");
+ efi_printk(sys_table, "Failed to allocate memory for 'properties'\n");
return;
}
@@ -358,9 +233,9 @@
new->next = 0;
data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data;
- if (!data)
+ if (!data) {
boot_params->hdr.setup_data = (unsigned long)new;
- else {
+ } else {
while (data->next)
data = (struct setup_data *)(unsigned long)data->next;
data->next = (unsigned long)new;
@@ -380,105 +255,18 @@
}
}
-static efi_status_t
-setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
-{
- struct efi_uga_draw_protocol *uga = NULL, *first_uga;
- efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
- unsigned long nr_ugas;
- u32 *handles = (u32 *)uga_handle;
- efi_status_t status = EFI_INVALID_PARAMETER;
- int i;
-
- first_uga = NULL;
- nr_ugas = size / sizeof(u32);
- for (i = 0; i < nr_ugas; i++) {
- efi_guid_t pciio_proto = EFI_PCI_IO_PROTOCOL_GUID;
- u32 w, h, depth, refresh;
- void *pciio;
- u32 handle = handles[i];
-
- status = efi_call_early(handle_protocol, handle,
- &uga_proto, (void **)&uga);
- if (status != EFI_SUCCESS)
- continue;
-
- efi_call_early(handle_protocol, handle, &pciio_proto, &pciio);
-
- status = efi_early->call((unsigned long)uga->get_mode, uga,
- &w, &h, &depth, &refresh);
- if (status == EFI_SUCCESS && (!first_uga || pciio)) {
- *width = w;
- *height = h;
-
- /*
- * Once we've found a UGA supporting PCIIO,
- * don't bother looking any further.
- */
- if (pciio)
- break;
-
- first_uga = uga;
- }
- }
-
- return status;
-}
-
-static efi_status_t
-setup_uga64(void **uga_handle, unsigned long size, u32 *width, u32 *height)
-{
- struct efi_uga_draw_protocol *uga = NULL, *first_uga;
- efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
- unsigned long nr_ugas;
- u64 *handles = (u64 *)uga_handle;
- efi_status_t status = EFI_INVALID_PARAMETER;
- int i;
-
- first_uga = NULL;
- nr_ugas = size / sizeof(u64);
- for (i = 0; i < nr_ugas; i++) {
- efi_guid_t pciio_proto = EFI_PCI_IO_PROTOCOL_GUID;
- u32 w, h, depth, refresh;
- void *pciio;
- u64 handle = handles[i];
-
- status = efi_call_early(handle_protocol, handle,
- &uga_proto, (void **)&uga);
- if (status != EFI_SUCCESS)
- continue;
-
- efi_call_early(handle_protocol, handle, &pciio_proto, &pciio);
-
- status = efi_early->call((unsigned long)uga->get_mode, uga,
- &w, &h, &depth, &refresh);
- if (status == EFI_SUCCESS && (!first_uga || pciio)) {
- *width = w;
- *height = h;
-
- /*
- * Once we've found a UGA supporting PCIIO,
- * don't bother looking any further.
- */
- if (pciio)
- break;
-
- first_uga = uga;
- }
- }
-
- return status;
-}
-
/*
* See if we have Universal Graphics Adapter (UGA) protocol
*/
-static efi_status_t setup_uga(struct screen_info *si, efi_guid_t *uga_proto,
- unsigned long size)
+static efi_status_t
+setup_uga(struct screen_info *si, efi_guid_t *uga_proto, unsigned long size)
{
efi_status_t status;
u32 width, height;
void **uga_handle = NULL;
+ efi_uga_draw_protocol_t *uga = NULL, *first_uga;
+ unsigned long nr_ugas;
+ int i;
status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
size, (void **)&uga_handle);
@@ -494,32 +282,62 @@
height = 0;
width = 0;
- if (efi_early->is64)
- status = setup_uga64(uga_handle, size, &width, &height);
- else
- status = setup_uga32(uga_handle, size, &width, &height);
+ first_uga = NULL;
+ nr_ugas = size / (efi_is_64bit() ? sizeof(u64) : sizeof(u32));
+ for (i = 0; i < nr_ugas; i++) {
+ efi_guid_t pciio_proto = EFI_PCI_IO_PROTOCOL_GUID;
+ u32 w, h, depth, refresh;
+ void *pciio;
+ unsigned long handle = efi_is_64bit() ? ((u64 *)uga_handle)[i]
+ : ((u32 *)uga_handle)[i];
+
+ status = efi_call_early(handle_protocol, handle,
+ uga_proto, (void **)&uga);
+ if (status != EFI_SUCCESS)
+ continue;
+
+ pciio = NULL;
+ efi_call_early(handle_protocol, handle, &pciio_proto, &pciio);
+
+ status = efi_call_proto(efi_uga_draw_protocol, get_mode, uga,
+ &w, &h, &depth, &refresh);
+ if (status == EFI_SUCCESS && (!first_uga || pciio)) {
+ width = w;
+ height = h;
+
+ /*
+ * Once we've found a UGA supporting PCIIO,
+ * don't bother looking any further.
+ */
+ if (pciio)
+ break;
+
+ first_uga = uga;
+ }
+ }
if (!width && !height)
goto free_handle;
/* EFI framebuffer */
- si->orig_video_isVGA = VIDEO_TYPE_EFI;
+ si->orig_video_isVGA = VIDEO_TYPE_EFI;
- si->lfb_depth = 32;
- si->lfb_width = width;
- si->lfb_height = height;
+ si->lfb_depth = 32;
+ si->lfb_width = width;
+ si->lfb_height = height;
- si->red_size = 8;
- si->red_pos = 16;
- si->green_size = 8;
- si->green_pos = 8;
- si->blue_size = 8;
- si->blue_pos = 0;
- si->rsvd_size = 8;
- si->rsvd_pos = 24;
+ si->red_size = 8;
+ si->red_pos = 16;
+ si->green_size = 8;
+ si->green_pos = 8;
+ si->blue_size = 8;
+ si->blue_pos = 0;
+ si->rsvd_size = 8;
+ si->rsvd_pos = 24;
free_handle:
efi_call_early(free_pool, uga_handle);
+
return status;
}
@@ -586,7 +404,7 @@
if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
return NULL;
- if (efi_early->is64)
+ if (efi_is_64bit())
setup_boot_services64(efi_early);
else
setup_boot_services32(efi_early);
@@ -601,7 +419,7 @@
status = efi_low_alloc(sys_table, 0x4000, 1,
(unsigned long *)&boot_params);
if (status != EFI_SUCCESS) {
- efi_printk(sys_table, "Failed to alloc lowmem for boot params\n");
+ efi_printk(sys_table, "Failed to allocate lowmem for boot params\n");
return NULL;
}
@@ -617,9 +435,9 @@
* Fill out some of the header fields ourselves because the
* EFI firmware loader doesn't load the first sector.
*/
- hdr->root_flags = 1;
- hdr->vid_mode = 0xffff;
- hdr->boot_flag = 0xAA55;
+ hdr->root_flags = 1;
+ hdr->vid_mode = 0xffff;
+ hdr->boot_flag = 0xAA55;
hdr->type_of_loader = 0x21;
@@ -627,6 +445,7 @@
cmdline_ptr = efi_convert_cmdline(sys_table, image, &options_size);
if (!cmdline_ptr)
goto fail;
+
hdr->cmd_line_ptr = (unsigned long)cmdline_ptr;
/* Fill in upper bits of command line address, NOP on 32 bit */
boot_params->ext_cmd_line_ptr = (u64)(unsigned long)cmdline_ptr >> 32;
@@ -663,10 +482,12 @@
boot_params->ext_ramdisk_size = (u64)ramdisk_size >> 32;
return boot_params;
+
fail2:
efi_free(sys_table, options_size, hdr->cmd_line_ptr);
fail:
efi_free(sys_table, 0x4000, (unsigned long)boot_params);
+
return NULL;
}
@@ -678,7 +499,7 @@
unsigned long size;
e820ext->type = SETUP_E820_EXT;
- e820ext->len = nr_entries * sizeof(struct boot_e820_entry);
+ e820ext->len = nr_entries * sizeof(struct boot_e820_entry);
e820ext->next = 0;
data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
@@ -692,8 +513,8 @@
params->hdr.setup_data = (unsigned long)e820ext;
}
-static efi_status_t setup_e820(struct boot_params *params,
- struct setup_data *e820ext, u32 e820ext_size)
+static efi_status_t
+setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_size)
{
struct boot_e820_entry *entry = params->e820_table;
struct efi_info *efi = ¶ms->efi_info;
@@ -814,11 +635,10 @@
}
struct exit_boot_struct {
- struct boot_params *boot_params;
- struct efi_info *efi;
- struct setup_data *e820ext;
- __u32 e820ext_size;
- bool is64;
+ struct boot_params *boot_params;
+ struct efi_info *efi;
+ struct setup_data *e820ext;
+ __u32 e820ext_size;
};
static efi_status_t exit_boot_func(efi_system_table_t *sys_table_arg,
@@ -845,25 +665,25 @@
first = false;
}
- signature = p->is64 ? EFI64_LOADER_SIGNATURE : EFI32_LOADER_SIGNATURE;
+ signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
+ : EFI32_LOADER_SIGNATURE;
memcpy(&p->efi->efi_loader_signature, signature, sizeof(__u32));
- p->efi->efi_systab = (unsigned long)sys_table_arg;
- p->efi->efi_memdesc_size = *map->desc_size;
- p->efi->efi_memdesc_version = *map->desc_ver;
- p->efi->efi_memmap = (unsigned long)*map->map;
- p->efi->efi_memmap_size = *map->map_size;
+ p->efi->efi_systab = (unsigned long)sys_table_arg;
+ p->efi->efi_memdesc_size = *map->desc_size;
+ p->efi->efi_memdesc_version = *map->desc_ver;
+ p->efi->efi_memmap = (unsigned long)*map->map;
+ p->efi->efi_memmap_size = *map->map_size;
#ifdef CONFIG_X86_64
- p->efi->efi_systab_hi = (unsigned long)sys_table_arg >> 32;
- p->efi->efi_memmap_hi = (unsigned long)*map->map >> 32;
+ p->efi->efi_systab_hi = (unsigned long)sys_table_arg >> 32;
+ p->efi->efi_memmap_hi = (unsigned long)*map->map >> 32;
#endif
return EFI_SUCCESS;
}
-static efi_status_t exit_boot(struct boot_params *boot_params,
- void *handle, bool is64)
+static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
{
unsigned long map_sz, key, desc_size, buff_size;
efi_memory_desc_t *mem_map;
@@ -874,17 +694,16 @@
struct efi_boot_memmap map;
struct exit_boot_struct priv;
- map.map = &mem_map;
- map.map_size = &map_sz;
- map.desc_size = &desc_size;
- map.desc_ver = &desc_version;
- map.key_ptr = &key;
- map.buff_size = &buff_size;
- priv.boot_params = boot_params;
- priv.efi = &boot_params->efi_info;
- priv.e820ext = NULL;
- priv.e820ext_size = 0;
- priv.is64 = is64;
+ map.map = &mem_map;
+ map.map_size = &map_sz;
+ map.desc_size = &desc_size;
+ map.desc_ver = &desc_version;
+ map.key_ptr = &key;
+ map.buff_size = &buff_size;
+ priv.boot_params = boot_params;
+ priv.efi = &boot_params->efi_info;
+ priv.e820ext = NULL;
+ priv.e820ext_size = 0;
/* Might as well exit boot services now */
status = efi_exit_boot_services(sys_table, handle, &map, &priv,
@@ -892,10 +711,11 @@
if (status != EFI_SUCCESS)
return status;
- e820ext = priv.e820ext;
- e820ext_size = priv.e820ext_size;
+ e820ext = priv.e820ext;
+ e820ext_size = priv.e820ext_size;
+
/* Historic? */
- boot_params->alt_mem_k = 32 * 1024;
+ boot_params->alt_mem_k = 32 * 1024;
status = setup_e820(boot_params, e820ext, e820ext_size);
if (status != EFI_SUCCESS)
@@ -908,8 +728,8 @@
* On success we return a pointer to a boot_params structure, and NULL
* on failure.
*/
-struct boot_params *efi_main(struct efi_config *c,
- struct boot_params *boot_params)
+struct boot_params *
+efi_main(struct efi_config *c, struct boot_params *boot_params)
{
struct desc_ptr *gdt = NULL;
efi_loaded_image_t *image;
@@ -918,13 +738,11 @@
struct desc_struct *desc;
void *handle;
efi_system_table_t *_table;
- bool is64;
efi_early = c;
_table = (efi_system_table_t *)(unsigned long)efi_early->table;
handle = (void *)(unsigned long)efi_early->image_handle;
- is64 = efi_early->is64;
sys_table = _table;
@@ -932,7 +750,7 @@
if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
goto fail;
- if (is64)
+ if (efi_is_64bit())
setup_boot_services64(efi_early);
else
setup_boot_services32(efi_early);
@@ -957,7 +775,7 @@
status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
sizeof(*gdt), (void **)&gdt);
if (status != EFI_SUCCESS) {
- efi_printk(sys_table, "Failed to alloc mem for gdt structure\n");
+ efi_printk(sys_table, "Failed to allocate memory for 'gdt' structure\n");
goto fail;
}
@@ -965,7 +783,7 @@
status = efi_low_alloc(sys_table, gdt->size, 8,
(unsigned long *)&gdt->address);
if (status != EFI_SUCCESS) {
- efi_printk(sys_table, "Failed to alloc mem for gdt\n");
+ efi_printk(sys_table, "Failed to allocate memory for 'gdt'\n");
goto fail;
}
@@ -988,7 +806,7 @@
hdr->code32_start = bzimage_addr;
}
- status = exit_boot(boot_params, handle, is64);
+ status = exit_boot(boot_params, handle);
if (status != EFI_SUCCESS) {
efi_printk(sys_table, "exit_boot() failed!\n");
goto fail;
@@ -1002,19 +820,20 @@
if (IS_ENABLED(CONFIG_X86_64)) {
/* __KERNEL32_CS */
- desc->limit0 = 0xffff;
- desc->base0 = 0x0000;
- desc->base1 = 0x0000;
- desc->type = SEG_TYPE_CODE | SEG_TYPE_EXEC_READ;
- desc->s = DESC_TYPE_CODE_DATA;
- desc->dpl = 0;
- desc->p = 1;
- desc->limit1 = 0xf;
- desc->avl = 0;
- desc->l = 0;
- desc->d = SEG_OP_SIZE_32BIT;
- desc->g = SEG_GRANULARITY_4KB;
- desc->base2 = 0x00;
+ desc->limit0 = 0xffff;
+ desc->base0 = 0x0000;
+ desc->base1 = 0x0000;
+ desc->type = SEG_TYPE_CODE | SEG_TYPE_EXEC_READ;
+ desc->s = DESC_TYPE_CODE_DATA;
+ desc->dpl = 0;
+ desc->p = 1;
+ desc->limit1 = 0xf;
+ desc->avl = 0;
+ desc->l = 0;
+ desc->d = SEG_OP_SIZE_32BIT;
+ desc->g = SEG_GRANULARITY_4KB;
+ desc->base2 = 0x00;
+
desc++;
} else {
/* Second entry is unused on 32-bit */
@@ -1022,15 +841,16 @@
}
/* __KERNEL_CS */
- desc->limit0 = 0xffff;
- desc->base0 = 0x0000;
- desc->base1 = 0x0000;
- desc->type = SEG_TYPE_CODE | SEG_TYPE_EXEC_READ;
- desc->s = DESC_TYPE_CODE_DATA;
- desc->dpl = 0;
- desc->p = 1;
- desc->limit1 = 0xf;
- desc->avl = 0;
+ desc->limit0 = 0xffff;
+ desc->base0 = 0x0000;
+ desc->base1 = 0x0000;
+ desc->type = SEG_TYPE_CODE | SEG_TYPE_EXEC_READ;
+ desc->s = DESC_TYPE_CODE_DATA;
+ desc->dpl = 0;
+ desc->p = 1;
+ desc->limit1 = 0xf;
+ desc->avl = 0;
+
if (IS_ENABLED(CONFIG_X86_64)) {
desc->l = 1;
desc->d = 0;
@@ -1038,41 +858,41 @@
desc->l = 0;
desc->d = SEG_OP_SIZE_32BIT;
}
- desc->g = SEG_GRANULARITY_4KB;
- desc->base2 = 0x00;
+ desc->g = SEG_GRANULARITY_4KB;
+ desc->base2 = 0x00;
desc++;
/* __KERNEL_DS */
- desc->limit0 = 0xffff;
- desc->base0 = 0x0000;
- desc->base1 = 0x0000;
- desc->type = SEG_TYPE_DATA | SEG_TYPE_READ_WRITE;
- desc->s = DESC_TYPE_CODE_DATA;
- desc->dpl = 0;
- desc->p = 1;
- desc->limit1 = 0xf;
- desc->avl = 0;
- desc->l = 0;
- desc->d = SEG_OP_SIZE_32BIT;
- desc->g = SEG_GRANULARITY_4KB;
- desc->base2 = 0x00;
+ desc->limit0 = 0xffff;
+ desc->base0 = 0x0000;
+ desc->base1 = 0x0000;
+ desc->type = SEG_TYPE_DATA | SEG_TYPE_READ_WRITE;
+ desc->s = DESC_TYPE_CODE_DATA;
+ desc->dpl = 0;
+ desc->p = 1;
+ desc->limit1 = 0xf;
+ desc->avl = 0;
+ desc->l = 0;
+ desc->d = SEG_OP_SIZE_32BIT;
+ desc->g = SEG_GRANULARITY_4KB;
+ desc->base2 = 0x00;
desc++;
if (IS_ENABLED(CONFIG_X86_64)) {
/* Task segment value */
- desc->limit0 = 0x0000;
- desc->base0 = 0x0000;
- desc->base1 = 0x0000;
- desc->type = SEG_TYPE_TSS;
- desc->s = 0;
- desc->dpl = 0;
- desc->p = 1;
- desc->limit1 = 0x0;
- desc->avl = 0;
- desc->l = 0;
- desc->d = 0;
- desc->g = SEG_GRANULARITY_4KB;
- desc->base2 = 0x00;
+ desc->limit0 = 0x0000;
+ desc->base0 = 0x0000;
+ desc->base1 = 0x0000;
+ desc->type = SEG_TYPE_TSS;
+ desc->s = 0;
+ desc->dpl = 0;
+ desc->p = 1;
+ desc->limit1 = 0x0;
+ desc->avl = 0;
+ desc->l = 0;
+ desc->d = 0;
+ desc->g = SEG_GRANULARITY_4KB;
+ desc->base2 = 0x00;
desc++;
}
@@ -1082,5 +902,6 @@
return boot_params;
fail:
efi_printk(sys_table, "efi_main() failed!\n");
+
return NULL;
}
diff --git a/arch/x86/boot/compressed/eboot.h b/arch/x86/boot/compressed/eboot.h
index e799dc5..8297387 100644
--- a/arch/x86/boot/compressed/eboot.h
+++ b/arch/x86/boot/compressed/eboot.h
@@ -12,22 +12,22 @@
#define DESC_TYPE_CODE_DATA (1 << 0)
-struct efi_uga_draw_protocol_32 {
+typedef struct {
u32 get_mode;
u32 set_mode;
u32 blt;
-};
+} efi_uga_draw_protocol_32_t;
-struct efi_uga_draw_protocol_64 {
+typedef struct {
u64 get_mode;
u64 set_mode;
u64 blt;
-};
+} efi_uga_draw_protocol_64_t;
-struct efi_uga_draw_protocol {
+typedef struct {
void *get_mode;
void *set_mode;
void *blt;
-};
+} efi_uga_draw_protocol_t;
#endif /* BOOT_COMPRESSED_EBOOT_H */
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index b87a758..3025179 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -102,7 +102,7 @@
/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
-unsigned long long mem_limit = ULLONG_MAX;
+static unsigned long long mem_limit = ULLONG_MAX;
enum mem_avoid_index {
@@ -215,7 +215,36 @@
memmap_too_large = true;
}
-static int handle_mem_memmap(void)
+/* Store the number of 1GB huge pages which users specified: */
+static unsigned long max_gb_huge_pages;
+
+static void parse_gb_huge_pages(char *param, char *val)
+{
+ static bool gbpage_sz;
+ char *p;
+
+ if (!strcmp(param, "hugepagesz")) {
+ p = val;
+ if (memparse(p, &p) != PUD_SIZE) {
+ gbpage_sz = false;
+ return;
+ }
+
+ if (gbpage_sz)
+ warn("Repeatedly set hugeTLB page size of 1G!\n");
+ gbpage_sz = true;
+ return;
+ }
+
+ if (!strcmp(param, "hugepages") && gbpage_sz) {
+ p = val;
+ max_gb_huge_pages = simple_strtoull(p, &p, 0);
+ return;
+ }
+}
+
+
+static int handle_mem_options(void)
{
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
@@ -223,7 +252,8 @@
char *param, *val;
u64 mem_size;
- if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+ if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+ !strstr(args, "hugepages"))
return 0;
tmp_cmdline = malloc(len + 1);
@@ -248,6 +278,8 @@
if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+ } else if (strstr(param, "hugepages")) {
+ parse_gb_huge_pages(param, val);
} else if (!strcmp(param, "mem")) {
char *p = val;
@@ -387,7 +419,7 @@
/* We don't need to set a mapping for setup_data. */
/* Mark the memmap regions we need to avoid */
- handle_mem_memmap();
+ handle_mem_options();
#ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
@@ -466,6 +498,60 @@
}
}
+/*
+ * Skip as many 1GB huge pages as possible in the passed region
+ * according to the number which users specified:
+ */
+static void
+process_gb_huge_pages(struct mem_vector *region, unsigned long image_size)
+{
+ unsigned long addr, size = 0;
+ struct mem_vector tmp;
+ int i = 0;
+
+ if (!max_gb_huge_pages) {
+ store_slot_info(region, image_size);
+ return;
+ }
+
+ addr = ALIGN(region->start, PUD_SIZE);
+ /* Did we raise the address above the passed in memory entry? */
+ if (addr < region->start + region->size)
+ size = region->size - (addr - region->start);
+
+ /* Check how many 1GB huge pages can be filtered out: */
+ while (size > PUD_SIZE && max_gb_huge_pages) {
+ size -= PUD_SIZE;
+ max_gb_huge_pages--;
+ i++;
+ }
+
+ /* No good 1GB huge pages found: */
+ if (!i) {
+ store_slot_info(region, image_size);
+ return;
+ }
+
+ /*
+ * Skip those 'i'*1GB good huge pages, and continue checking and
+ * processing the remaining head or tail part of the passed region
+ * if available.
+ */
+
+ if (addr >= region->start + image_size) {
+ tmp.start = region->start;
+ tmp.size = addr - region->start;
+ store_slot_info(&tmp, image_size);
+ }
+
+ size = region->size - (addr - region->start) - i * PUD_SIZE;
+ if (size >= image_size) {
+ tmp.start = addr + i * PUD_SIZE;
+ tmp.size = size;
+ store_slot_info(&tmp, image_size);
+ }
+}
+
static unsigned long slots_fetch_random(void)
{
unsigned long slot;
@@ -546,7 +632,7 @@
/* If nothing overlaps, store the region and return. */
if (!mem_avoid_overlap(®ion, &overlap)) {
- store_slot_info(®ion, image_size);
+ process_gb_huge_pages(®ion, image_size);
return;
}
@@ -556,7 +642,7 @@
beginning.start = region.start;
beginning.size = overlap.start - region.start;
- store_slot_info(&beginning, image_size);
+ process_gb_huge_pages(&beginning, image_size);
}
/* Return if overlap extends to or past end of region. */
diff --git a/arch/x86/boot/string.c b/arch/x86/boot/string.c
index 16f4912..c4428a1 100644
--- a/arch/x86/boot/string.c
+++ b/arch/x86/boot/string.c
@@ -13,6 +13,7 @@
*/
#include <linux/types.h>
+#include <asm/asm.h>
#include "ctype.h"
#include "string.h"
@@ -28,8 +29,8 @@
int memcmp(const void *s1, const void *s2, size_t len)
{
bool diff;
- asm("repe; cmpsb; setnz %0"
- : "=qm" (diff), "+D" (s1), "+S" (s2), "+c" (len));
+ asm("repe; cmpsb" CC_SET(nz)
+ : CC_OUT(nz) (diff), "+D" (s1), "+S" (s2), "+c" (len));
return diff;
}
diff --git a/arch/x86/crypto/aegis128-aesni-asm.S b/arch/x86/crypto/aegis128-aesni-asm.S
index 717bf07..5f7e43d 100644
--- a/arch/x86/crypto/aegis128-aesni-asm.S
+++ b/arch/x86/crypto/aegis128-aesni-asm.S
@@ -75,7 +75,7 @@
* %r9
*/
__load_partial:
- xor %r9, %r9
+ xor %r9d, %r9d
pxor MSG, MSG
mov LEN, %r8
diff --git a/arch/x86/crypto/aegis128l-aesni-asm.S b/arch/x86/crypto/aegis128l-aesni-asm.S
index 4eda2b8..491dd61 100644
--- a/arch/x86/crypto/aegis128l-aesni-asm.S
+++ b/arch/x86/crypto/aegis128l-aesni-asm.S
@@ -66,7 +66,7 @@
* %r9
*/
__load_partial:
- xor %r9, %r9
+ xor %r9d, %r9d
pxor MSG0, MSG0
pxor MSG1, MSG1
diff --git a/arch/x86/crypto/aegis256-aesni-asm.S b/arch/x86/crypto/aegis256-aesni-asm.S
index 32aae83..8870c7c 100644
--- a/arch/x86/crypto/aegis256-aesni-asm.S
+++ b/arch/x86/crypto/aegis256-aesni-asm.S
@@ -59,7 +59,7 @@
* %r9
*/
__load_partial:
- xor %r9, %r9
+ xor %r9d, %r9d
pxor MSG, MSG
mov LEN, %r8
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index e762ef4..9bd1395 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -258,7 +258,7 @@
.macro GCM_INIT Iv SUBKEY AAD AADLEN
mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
- xor %r11, %r11
+ xor %r11d, %r11d
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
@@ -286,7 +286,7 @@
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
- xor %r11, %r11 # initialise the data pointer offset as zero
+ xor %r11d, %r11d # initialise the data pointer offset as zero
PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
sub %r11, %arg5 # sub partial block data used
@@ -702,7 +702,7 @@
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
- xor %rax,%rax
+ xor %eax, %eax
mov %rax, PBlockLen(%arg2)
jmp _dec_done_\@
@@ -737,7 +737,7 @@
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
- xor %rax,%rax
+ xor %eax, %eax
mov %rax, PBlockLen(%arg2)
jmp _encode_done_\@
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb15..1985ea0 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -463,7 +463,7 @@
_get_AAD_done\@:
# initialize the data pointer offset as zero
- xor %r11, %r11
+ xor %r11d, %r11d
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
@@ -1770,7 +1770,7 @@
_get_AAD_done\@:
# initialize the data pointer offset as zero
- xor %r11, %r11
+ xor %r11d, %r11d
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
diff --git a/arch/x86/crypto/morus1280-avx2-asm.S b/arch/x86/crypto/morus1280-avx2-asm.S
index 07653d4..de182c4 100644
--- a/arch/x86/crypto/morus1280-avx2-asm.S
+++ b/arch/x86/crypto/morus1280-avx2-asm.S
@@ -113,7 +113,7 @@
* %r9
*/
__load_partial:
- xor %r9, %r9
+ xor %r9d, %r9d
vpxor MSG, MSG, MSG
mov %rcx, %r8
diff --git a/arch/x86/crypto/morus1280-sse2-asm.S b/arch/x86/crypto/morus1280-sse2-asm.S
index bd1aa1b..da5d290 100644
--- a/arch/x86/crypto/morus1280-sse2-asm.S
+++ b/arch/x86/crypto/morus1280-sse2-asm.S
@@ -235,7 +235,7 @@
* %r9
*/
__load_partial:
- xor %r9, %r9
+ xor %r9d, %r9d
pxor MSG_LO, MSG_LO
pxor MSG_HI, MSG_HI
diff --git a/arch/x86/crypto/morus640-sse2-asm.S b/arch/x86/crypto/morus640-sse2-asm.S
index efa0281..414db48 100644
--- a/arch/x86/crypto/morus640-sse2-asm.S
+++ b/arch/x86/crypto/morus640-sse2-asm.S
@@ -113,7 +113,7 @@
* %r9
*/
__load_partial:
- xor %r9, %r9
+ xor %r9d, %r9d
pxor MSG, MSG
mov %rcx, %r8
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
index 6204bd5..613d0bfc 100644
--- a/arch/x86/crypto/sha1_ssse3_asm.S
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -96,7 +96,7 @@
# cleanup workspace
mov $8, %ecx
mov %rsp, %rdi
- xor %rax, %rax
+ xor %eax, %eax
rep stosq
mov %rbp, %rsp # deallocate workspace
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index c371bfe..2767c62 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -65,7 +65,7 @@
# define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
#else
# define preempt_stop(clobbers)
-# define resume_kernel restore_all
+# define resume_kernel restore_all_kernel
#endif
.macro TRACE_IRQS_IRET
@@ -77,6 +77,8 @@
#endif
.endm
+#define PTI_SWITCH_MASK (1 << PAGE_SHIFT)
+
/*
* User gs save/restore
*
@@ -154,7 +156,52 @@
#endif /* CONFIG_X86_32_LAZY_GS */
-.macro SAVE_ALL pt_regs_ax=%eax
+/* Unconditionally switch to user cr3 */
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+ movl %cr3, \scratch_reg
+ orl $PTI_SWITCH_MASK, \scratch_reg
+ movl \scratch_reg, %cr3
+.Lend_\@:
+.endm
+
+.macro BUG_IF_WRONG_CR3 no_user_check=0
+#ifdef CONFIG_DEBUG_ENTRY
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+ .if \no_user_check == 0
+ /* coming from usermode? */
+ testl $SEGMENT_RPL_MASK, PT_CS(%esp)
+ jz .Lend_\@
+ .endif
+ /* On user-cr3? */
+ movl %cr3, %eax
+ testl $PTI_SWITCH_MASK, %eax
+ jnz .Lend_\@
+ /* From userspace with kernel cr3 - BUG */
+ ud2
+.Lend_\@:
+#endif
+.endm
+
+/*
+ * Switch to kernel cr3 if not already loaded and return current cr3 in
+ * \scratch_reg
+ */
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+ movl %cr3, \scratch_reg
+ /* Test if we are already on kernel CR3 */
+ testl $PTI_SWITCH_MASK, \scratch_reg
+ jz .Lend_\@
+ andl $(~PTI_SWITCH_MASK), \scratch_reg
+ movl \scratch_reg, %cr3
+ /* Return original CR3 in \scratch_reg */
+ orl $PTI_SWITCH_MASK, \scratch_reg
+.Lend_\@:
+.endm
+
+.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
cld
PUSH_GS
pushl %fs
@@ -173,6 +220,29 @@
movl $(__KERNEL_PERCPU), %edx
movl %edx, %fs
SET_KERNEL_GS %edx
+
+ /* Switch to kernel stack if necessary */
+.if \switch_stacks > 0
+ SWITCH_TO_KERNEL_STACK
+.endif
+
+.endm
+
+.macro SAVE_ALL_NMI cr3_reg:req
+ SAVE_ALL
+
+ BUG_IF_WRONG_CR3
+
+ /*
+ * Now switch the CR3 when PTI is enabled.
+ *
+ * We can enter with either user or kernel cr3, the code will
+ * store the old cr3 in \cr3_reg and switches to the kernel cr3
+ * if necessary.
+ */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=\cr3_reg
+
+.Lend_\@:
.endm
/*
@@ -221,6 +291,349 @@
POP_GS_EX
.endm
+.macro RESTORE_ALL_NMI cr3_reg:req pop=0
+ /*
+ * Now switch the CR3 when PTI is enabled.
+ *
+ * We enter with kernel cr3 and switch the cr3 to the value
+ * stored on \cr3_reg, which is either a user or a kernel cr3.
+ */
+ ALTERNATIVE "jmp .Lswitched_\@", "", X86_FEATURE_PTI
+
+ testl $PTI_SWITCH_MASK, \cr3_reg
+ jz .Lswitched_\@
+
+ /* User cr3 in \cr3_reg - write it to hardware cr3 */
+ movl \cr3_reg, %cr3
+
+.Lswitched_\@:
+
+ BUG_IF_WRONG_CR3
+
+ RESTORE_REGS pop=\pop
+.endm
+
+.macro CHECK_AND_APPLY_ESPFIX
+#ifdef CONFIG_X86_ESPFIX32
+#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
+
+ ALTERNATIVE "jmp .Lend_\@", "", X86_BUG_ESPFIX
+
+ movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS
+ /*
+ * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
+ * are returning to the kernel.
+ * See comments in process.c:copy_thread() for details.
+ */
+ movb PT_OLDSS(%esp), %ah
+ movb PT_CS(%esp), %al
+ andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
+ cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
+ jne .Lend_\@ # returning to user-space with LDT SS
+
+ /*
+ * Setup and switch to ESPFIX stack
+ *
+ * We're returning to userspace with a 16 bit stack. The CPU will not
+ * restore the high word of ESP for us on executing iret... This is an
+ * "official" bug of all the x86-compatible CPUs, which we can work
+ * around to make dosemu and wine happy. We do this by preloading the
+ * high word of ESP with the high word of the userspace ESP while
+ * compensating for the offset by changing to the ESPFIX segment with
+ * a base address that matches for the difference.
+ */
+ mov %esp, %edx /* load kernel esp */
+ mov PT_OLDESP(%esp), %eax /* load userspace esp */
+ mov %dx, %ax /* eax: new kernel esp */
+ sub %eax, %edx /* offset (low word is 0) */
+ shr $16, %edx
+ mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */
+ mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */
+ pushl $__ESPFIX_SS
+ pushl %eax /* new kernel esp */
+ /*
+ * Disable interrupts, but do not irqtrace this section: we
+ * will soon execute iret and the tracer was already set to
+ * the irqstate after the IRET:
+ */
+ DISABLE_INTERRUPTS(CLBR_ANY)
+ lss (%esp), %esp /* switch to espfix segment */
+.Lend_\@:
+#endif /* CONFIG_X86_ESPFIX32 */
+.endm
+
+/*
+ * Called with pt_regs fully populated and kernel segments loaded,
+ * so we can access PER_CPU and use the integer registers.
+ *
+ * We need to be very careful here with the %esp switch, because an NMI
+ * can happen everywhere. If the NMI handler finds itself on the
+ * entry-stack, it will overwrite the task-stack and everything we
+ * copied there. So allocate the stack-frame on the task-stack and
+ * switch to it before we do any copying.
+ */
+
+#define CS_FROM_ENTRY_STACK (1 << 31)
+#define CS_FROM_USER_CR3 (1 << 30)
+
+.macro SWITCH_TO_KERNEL_STACK
+
+ ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+ BUG_IF_WRONG_CR3
+
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+
+ /*
+ * %eax now contains the entry cr3 and we carry it forward in
+ * that register for the time this macro runs
+ */
+
+ /* Are we on the entry stack? Bail out if not! */
+ movl PER_CPU_VAR(cpu_entry_area), %ecx
+ addl $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
+ subl %esp, %ecx /* ecx = (end of entry_stack) - esp */
+ cmpl $SIZEOF_entry_stack, %ecx
+ jae .Lend_\@
+
+ /* Load stack pointer into %esi and %edi */
+ movl %esp, %esi
+ movl %esi, %edi
+
+ /* Move %edi to the top of the entry stack */
+ andl $(MASK_entry_stack), %edi
+ addl $(SIZEOF_entry_stack), %edi
+
+ /* Load top of task-stack into %edi */
+ movl TSS_entry2task_stack(%edi), %edi
+
+ /*
+ * Clear unused upper bits of the dword containing the word-sized CS
+ * slot in pt_regs in case hardware didn't clear it for us.
+ */
+ andl $(0x0000ffff), PT_CS(%esp)
+
+ /* Special case - entry from kernel mode via entry stack */
+#ifdef CONFIG_VM86
+ movl PT_EFLAGS(%esp), %ecx # mix EFLAGS and CS
+ movb PT_CS(%esp), %cl
+ andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %ecx
+#else
+ movl PT_CS(%esp), %ecx
+ andl $SEGMENT_RPL_MASK, %ecx
+#endif
+ cmpl $USER_RPL, %ecx
+ jb .Lentry_from_kernel_\@
+
+ /* Bytes to copy */
+ movl $PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+ testl $X86_EFLAGS_VM, PT_EFLAGS(%esi)
+ jz .Lcopy_pt_regs_\@
+
+ /*
+ * Stack-frame contains 4 additional segment registers when
+ * coming from VM86 mode
+ */
+ addl $(4 * 4), %ecx
+
+#endif
+.Lcopy_pt_regs_\@:
+
+ /* Allocate frame on task-stack */
+ subl %ecx, %edi
+
+ /* Switch to task-stack */
+ movl %edi, %esp
+
+ /*
+ * We are now on the task-stack and can safely copy over the
+ * stack-frame
+ */
+ shrl $2, %ecx
+ cld
+ rep movsl
+
+ jmp .Lend_\@
+
+.Lentry_from_kernel_\@:
+
+ /*
+ * This handles the case when we enter the kernel from
+ * kernel-mode and %esp points to the entry-stack. When this
+ * happens we need to switch to the task-stack to run C code,
+ * but switch back to the entry-stack again when we approach
+ * iret and return to the interrupted code-path. This usually
+ * happens when we hit an exception while restoring user-space
+ * segment registers on the way back to user-space or when the
+ * sysenter handler runs with eflags.tf set.
+ *
+ * When we switch to the task-stack here, we can't trust the
+ * contents of the entry-stack anymore, as the exception handler
+ * might be scheduled out or moved to another CPU. Therefore we
+ * copy the complete entry-stack to the task-stack and set a
+ * marker in the iret-frame (bit 31 of the CS dword) to detect
+ * what we've done on the iret path.
+ *
+ * On the iret path we copy everything back and switch to the
+ * entry-stack, so that the interrupted kernel code-path
+ * continues on the same stack it was interrupted with.
+ *
+ * Be aware that an NMI can happen anytime in this code.
+ *
+ * %esi: Entry-Stack pointer (same as %esp)
+ * %edi: Top of the task stack
+ * %eax: CR3 on kernel entry
+ */
+
+ /* Calculate number of bytes on the entry stack in %ecx */
+ movl %esi, %ecx
+
+ /* %ecx to the top of entry-stack */
+ andl $(MASK_entry_stack), %ecx
+ addl $(SIZEOF_entry_stack), %ecx
+
+ /* Number of bytes on the entry stack to %ecx */
+ sub %esi, %ecx
+
+ /* Mark stackframe as coming from entry stack */
+ orl $CS_FROM_ENTRY_STACK, PT_CS(%esp)
+
+ /*
+ * Test the cr3 used to enter the kernel and add a marker
+ * so that we can switch back to it before iret.
+ */
+ testl $PTI_SWITCH_MASK, %eax
+ jz .Lcopy_pt_regs_\@
+ orl $CS_FROM_USER_CR3, PT_CS(%esp)
+
+ /*
+ * %esi and %edi are unchanged, %ecx contains the number of
+ * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
+ * the stack-frame on task-stack and copy everything over
+ */
+ jmp .Lcopy_pt_regs_\@
+
+.Lend_\@:
+.endm
+
+/*
+ * Switch back from the kernel stack to the entry stack.
+ *
+ * The %esp register must point to pt_regs on the task stack. It will
+ * first calculate the size of the stack-frame to copy, depending on
+ * whether we return to VM86 mode or not. With that it uses 'rep movsl'
+ * to copy the contents of the stack over to the entry stack.
+ *
+ * We must be very careful here, as we can't trust the contents of the
+ * task-stack once we switched to the entry-stack. When an NMI happens
+ * while on the entry-stack, the NMI handler will switch back to the top
+ * of the task stack, overwriting our stack-frame we are about to copy.
+ * Therefore we switch the stack only after everything is copied over.
+ */
+.macro SWITCH_TO_ENTRY_STACK
+
+ ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+ /* Bytes to copy */
+ movl $PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+ testl $(X86_EFLAGS_VM), PT_EFLAGS(%esp)
+ jz .Lcopy_pt_regs_\@
+
+ /* Additional 4 registers to copy when returning to VM86 mode */
+ addl $(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+ /* Initialize source and destination for movsl */
+ movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+ subl %ecx, %edi
+ movl %esp, %esi
+
+ /* Save future stack pointer in %ebx */
+ movl %edi, %ebx
+
+ /* Copy over the stack-frame */
+ shrl $2, %ecx
+ cld
+ rep movsl
+
+ /*
+ * Switch to entry-stack - needs to happen after everything is
+ * copied because the NMI handler will overwrite the task-stack
+ * when on entry-stack
+ */
+ movl %ebx, %esp
+
+.Lend_\@:
+.endm
+
+/*
+ * This macro handles the case when we return to kernel-mode on the iret
+ * path and have to switch back to the entry stack and/or user-cr3
+ *
+ * See the comments below the .Lentry_from_kernel_\@ label in the
+ * SWITCH_TO_KERNEL_STACK macro for more details.
+ */
+.macro PARANOID_EXIT_TO_KERNEL_MODE
+
+ /*
+ * Test if we entered the kernel with the entry-stack. Most
+ * likely we did not, because this code only runs on the
+ * return-to-kernel path.
+ */
+ testl $CS_FROM_ENTRY_STACK, PT_CS(%esp)
+ jz .Lend_\@
+
+ /* Unlikely slow-path */
+
+ /* Clear marker from stack-frame */
+ andl $(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
+
+ /* Copy the remaining task-stack contents to entry-stack */
+ movl %esp, %esi
+ movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+
+ /* Bytes on the task-stack to ecx */
+ movl PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %ecx
+ subl %esi, %ecx
+
+ /* Allocate stack-frame on entry-stack */
+ subl %ecx, %edi
+
+ /*
+ * Save future stack-pointer, we must not switch until the
+ * copy is done, otherwise the NMI handler could destroy the
+ * contents of the task-stack we are about to copy.
+ */
+ movl %edi, %ebx
+
+ /* Do the copy */
+ shrl $2, %ecx
+ cld
+ rep movsl
+
+ /* Safe to switch to entry-stack now */
+ movl %ebx, %esp
+
+ /*
+ * We came from entry-stack and need to check if we also need to
+ * switch back to user cr3.
+ */
+ testl $CS_FROM_USER_CR3, PT_CS(%esp)
+ jz .Lend_\@
+
+ /* Clear marker from stack-frame */
+ andl $(~CS_FROM_USER_CR3), PT_CS(%esp)
+
+ SWITCH_TO_USER_CR3 scratch_reg=%eax
+
+.Lend_\@:
+.endm
/*
* %eax: prev task
* %edx: next task
@@ -351,9 +764,9 @@
DISABLE_INTERRUPTS(CLBR_ANY)
.Lneed_resched:
cmpl $0, PER_CPU_VAR(__preempt_count)
- jnz restore_all
+ jnz restore_all_kernel
testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ?
- jz restore_all
+ jz restore_all_kernel
call preempt_schedule_irq
jmp .Lneed_resched
END(resume_kernel)
@@ -412,7 +825,21 @@
* 0(%ebp) arg6
*/
ENTRY(entry_SYSENTER_32)
- movl TSS_sysenter_sp0(%esp), %esp
+ /*
+ * On entry-stack with all userspace-regs live - save and
+ * restore eflags and %eax to use it as scratch-reg for the cr3
+ * switch.
+ */
+ pushfl
+ pushl %eax
+ BUG_IF_WRONG_CR3 no_user_check=1
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+ popl %eax
+ popfl
+
+ /* Stack empty again, switch to task stack */
+ movl TSS_entry2task_stack(%esp), %esp
+
.Lsysenter_past_esp:
pushl $__USER_DS /* pt_regs->ss */
pushl %ebp /* pt_regs->sp (stashed in bp) */
@@ -421,7 +848,7 @@
pushl $__USER_CS /* pt_regs->cs */
pushl $0 /* pt_regs->ip = 0 (placeholder) */
pushl %eax /* pt_regs->orig_ax */
- SAVE_ALL pt_regs_ax=$-ENOSYS /* save rest */
+ SAVE_ALL pt_regs_ax=$-ENOSYS /* save rest, stack already switched */
/*
* SYSENTER doesn't filter flags, so we need to clear NT, AC
@@ -460,25 +887,49 @@
/* Opportunistic SYSEXIT */
TRACE_IRQS_ON /* User mode traces as IRQs on. */
+
+ /*
+ * Setup entry stack - we keep the pointer in %eax and do the
+ * switch after almost all user-state is restored.
+ */
+
+ /* Load entry stack pointer and allocate frame for eflags/eax */
+ movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %eax
+ subl $(2*4), %eax
+
+ /* Copy eflags and eax to entry stack */
+ movl PT_EFLAGS(%esp), %edi
+ movl PT_EAX(%esp), %esi
+ movl %edi, (%eax)
+ movl %esi, 4(%eax)
+
+ /* Restore user registers and segments */
movl PT_EIP(%esp), %edx /* pt_regs->ip */
movl PT_OLDESP(%esp), %ecx /* pt_regs->sp */
1: mov PT_FS(%esp), %fs
PTGS_TO_GS
+
popl %ebx /* pt_regs->bx */
addl $2*4, %esp /* skip pt_regs->cx and pt_regs->dx */
popl %esi /* pt_regs->si */
popl %edi /* pt_regs->di */
popl %ebp /* pt_regs->bp */
- popl %eax /* pt_regs->ax */
+
+ /* Switch to entry stack */
+ movl %eax, %esp
+
+ /* Now ready to switch the cr3 */
+ SWITCH_TO_USER_CR3 scratch_reg=%eax
/*
* Restore all flags except IF. (We restore IF separately because
* STI gives a one-instruction window in which we won't be interrupted,
* whereas POPF does not.)
*/
- addl $PT_EFLAGS-PT_DS, %esp /* point esp at pt_regs->flags */
btrl $X86_EFLAGS_IF_BIT, (%esp)
+ BUG_IF_WRONG_CR3 no_user_check=1
popfl
+ popl %eax
/*
* Return back to the vDSO, which will pop ecx and edx.
@@ -532,7 +983,8 @@
ENTRY(entry_INT80_32)
ASM_CLAC
pushl %eax /* pt_regs->orig_ax */
- SAVE_ALL pt_regs_ax=$-ENOSYS /* save rest */
+
+ SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1 /* save rest */
/*
* User mode is traced as though IRQs are on, and the interrupt gate
@@ -546,24 +998,17 @@
restore_all:
TRACE_IRQS_IRET
+ SWITCH_TO_ENTRY_STACK
.Lrestore_all_notrace:
-#ifdef CONFIG_X86_ESPFIX32
- ALTERNATIVE "jmp .Lrestore_nocheck", "", X86_BUG_ESPFIX
-
- movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS
- /*
- * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
- * are returning to the kernel.
- * See comments in process.c:copy_thread() for details.
- */
- movb PT_OLDSS(%esp), %ah
- movb PT_CS(%esp), %al
- andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
- cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
- je .Lldt_ss # returning to user-space with LDT SS
-#endif
+ CHECK_AND_APPLY_ESPFIX
.Lrestore_nocheck:
- RESTORE_REGS 4 # skip orig_eax/error_code
+ /* Switch back to user CR3 */
+ SWITCH_TO_USER_CR3 scratch_reg=%eax
+
+ BUG_IF_WRONG_CR3
+
+ /* Restore user state */
+ RESTORE_REGS pop=4 # skip orig_eax/error_code
.Lirq_return:
/*
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -572,46 +1017,33 @@
*/
INTERRUPT_RETURN
+restore_all_kernel:
+ TRACE_IRQS_IRET
+ PARANOID_EXIT_TO_KERNEL_MODE
+ BUG_IF_WRONG_CR3
+ RESTORE_REGS 4
+ jmp .Lirq_return
+
.section .fixup, "ax"
ENTRY(iret_exc )
pushl $0 # no error code
pushl $do_iret_error
+
+#ifdef CONFIG_DEBUG_ENTRY
+ /*
+ * The stack-frame here is the one that iret faulted on, so its a
+ * return-to-user frame. We are on kernel-cr3 because we come here from
+ * the fixup code. This confuses the CR3 checker, so switch to user-cr3
+ * as the checker expects it.
+ */
+ pushl %eax
+ SWITCH_TO_USER_CR3 scratch_reg=%eax
+ popl %eax
+#endif
+
jmp common_exception
.previous
_ASM_EXTABLE(.Lirq_return, iret_exc)
-
-#ifdef CONFIG_X86_ESPFIX32
-.Lldt_ss:
-/*
- * Setup and switch to ESPFIX stack
- *
- * We're returning to userspace with a 16 bit stack. The CPU will not
- * restore the high word of ESP for us on executing iret... This is an
- * "official" bug of all the x86-compatible CPUs, which we can work
- * around to make dosemu and wine happy. We do this by preloading the
- * high word of ESP with the high word of the userspace ESP while
- * compensating for the offset by changing to the ESPFIX segment with
- * a base address that matches for the difference.
- */
-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
- mov %esp, %edx /* load kernel esp */
- mov PT_OLDESP(%esp), %eax /* load userspace esp */
- mov %dx, %ax /* eax: new kernel esp */
- sub %eax, %edx /* offset (low word is 0) */
- shr $16, %edx
- mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */
- mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */
- pushl $__ESPFIX_SS
- pushl %eax /* new kernel esp */
- /*
- * Disable interrupts, but do not irqtrace this section: we
- * will soon execute iret and the tracer was already set to
- * the irqstate after the IRET:
- */
- DISABLE_INTERRUPTS(CLBR_ANY)
- lss (%esp), %esp /* switch to espfix segment */
- jmp .Lrestore_nocheck
-#endif
ENDPROC(entry_INT80_32)
.macro FIXUP_ESPFIX_STACK
@@ -671,7 +1103,8 @@
common_interrupt:
ASM_CLAC
addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */
- SAVE_ALL
+
+ SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
TRACE_IRQS_OFF
movl %esp, %eax
@@ -679,16 +1112,16 @@
jmp ret_from_intr
ENDPROC(common_interrupt)
-#define BUILD_INTERRUPT3(name, nr, fn) \
-ENTRY(name) \
- ASM_CLAC; \
- pushl $~(nr); \
- SAVE_ALL; \
- ENCODE_FRAME_POINTER; \
- TRACE_IRQS_OFF \
- movl %esp, %eax; \
- call fn; \
- jmp ret_from_intr; \
+#define BUILD_INTERRUPT3(name, nr, fn) \
+ENTRY(name) \
+ ASM_CLAC; \
+ pushl $~(nr); \
+ SAVE_ALL switch_stacks=1; \
+ ENCODE_FRAME_POINTER; \
+ TRACE_IRQS_OFF \
+ movl %esp, %eax; \
+ call fn; \
+ jmp ret_from_intr; \
ENDPROC(name)
#define BUILD_INTERRUPT(name, nr) \
@@ -920,16 +1353,20 @@
pushl %es
pushl %ds
pushl %eax
+ movl $(__USER_DS), %eax
+ movl %eax, %ds
+ movl %eax, %es
+ movl $(__KERNEL_PERCPU), %eax
+ movl %eax, %fs
pushl %ebp
pushl %edi
pushl %esi
pushl %edx
pushl %ecx
pushl %ebx
+ SWITCH_TO_KERNEL_STACK
ENCODE_FRAME_POINTER
cld
- movl $(__KERNEL_PERCPU), %ecx
- movl %ecx, %fs
UNWIND_ESPFIX_STACK
GS_TO_REG %ecx
movl PT_GS(%esp), %edi # get the function address
@@ -937,9 +1374,6 @@
movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
REG_TO_PTGS %ecx
SET_KERNEL_GS %ecx
- movl $(__USER_DS), %ecx
- movl %ecx, %ds
- movl %ecx, %es
TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
CALL_NOSPEC %edi
@@ -948,40 +1382,12 @@
ENTRY(debug)
/*
- * #DB can happen at the first instruction of
- * entry_SYSENTER_32 or in Xen's SYSENTER prologue. If this
- * happens, then we will be running on a very small stack. We
- * need to detect this condition and switch to the thread
- * stack before calling any C code at all.
- *
- * If you edit this code, keep in mind that NMIs can happen in here.
+ * Entry from sysenter is now handled in common_exception
*/
ASM_CLAC
pushl $-1 # mark this as an int
- SAVE_ALL
- ENCODE_FRAME_POINTER
- xorl %edx, %edx # error code 0
- movl %esp, %eax # pt_regs pointer
-
- /* Are we currently on the SYSENTER stack? */
- movl PER_CPU_VAR(cpu_entry_area), %ecx
- addl $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
- subl %eax, %ecx /* ecx = (end of entry_stack) - esp */
- cmpl $SIZEOF_entry_stack, %ecx
- jb .Ldebug_from_sysenter_stack
-
- TRACE_IRQS_OFF
- call do_debug
- jmp ret_from_exception
-
-.Ldebug_from_sysenter_stack:
- /* We're on the SYSENTER stack. Switch off. */
- movl %esp, %ebx
- movl PER_CPU_VAR(cpu_current_top_of_stack), %esp
- TRACE_IRQS_OFF
- call do_debug
- movl %ebx, %esp
- jmp ret_from_exception
+ pushl $do_debug
+ jmp common_exception
END(debug)
/*
@@ -993,6 +1399,7 @@
*/
ENTRY(nmi)
ASM_CLAC
+
#ifdef CONFIG_X86_ESPFIX32
pushl %eax
movl %ss, %eax
@@ -1002,7 +1409,7 @@
#endif
pushl %eax # pt_regs->orig_ax
- SAVE_ALL
+ SAVE_ALL_NMI cr3_reg=%edi
ENCODE_FRAME_POINTER
xorl %edx, %edx # zero error code
movl %esp, %eax # pt_regs pointer
@@ -1016,7 +1423,7 @@
/* Not on SYSENTER stack. */
call do_nmi
- jmp .Lrestore_all_notrace
+ jmp .Lnmi_return
.Lnmi_from_sysenter_stack:
/*
@@ -1027,7 +1434,11 @@
movl PER_CPU_VAR(cpu_current_top_of_stack), %esp
call do_nmi
movl %ebx, %esp
- jmp .Lrestore_all_notrace
+
+.Lnmi_return:
+ CHECK_AND_APPLY_ESPFIX
+ RESTORE_ALL_NMI cr3_reg=%edi pop=4
+ jmp .Lirq_return
#ifdef CONFIG_X86_ESPFIX32
.Lnmi_espfix_stack:
@@ -1042,12 +1453,12 @@
pushl 16(%esp)
.endr
pushl %eax
- SAVE_ALL
+ SAVE_ALL_NMI cr3_reg=%edi
ENCODE_FRAME_POINTER
FIXUP_ESPFIX_STACK # %eax == %esp
xorl %edx, %edx # zero error code
call do_nmi
- RESTORE_REGS
+ RESTORE_ALL_NMI cr3_reg=%edi
lss 12+4(%esp), %esp # back to espfix stack
jmp .Lirq_return
#endif
@@ -1056,7 +1467,8 @@
ENTRY(int3)
ASM_CLAC
pushl $-1 # mark this as an int
- SAVE_ALL
+
+ SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
TRACE_IRQS_OFF
xorl %edx, %edx # zero error code
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 8ae7ffd..957dfb6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -92,7 +92,7 @@
.endm
.macro TRACE_IRQS_IRETQ_DEBUG
- bt $9, EFLAGS(%rsp) /* interrupts off? */
+ btl $9, EFLAGS(%rsp) /* interrupts off? */
jnc 1f
TRACE_IRQS_ON_DEBUG
1:
@@ -408,6 +408,7 @@
1:
/* kernel thread */
+ UNWIND_HINT_EMPTY
movq %r12, %rdi
CALL_NOSPEC %rbx
/*
@@ -701,7 +702,7 @@
#ifdef CONFIG_PREEMPT
/* Interrupts are off */
/* Check if we need preemption */
- bt $9, EFLAGS(%rsp) /* were interrupts off? */
+ btl $9, EFLAGS(%rsp) /* were interrupts off? */
jnc 1f
0: cmpl $0, PER_CPU_VAR(__preempt_count)
jnz 1f
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b..9f695f5 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -46,10 +46,8 @@
CPPFLAGS_vdso.lds += -P -C
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
- -Wl,--no-undefined \
- -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
- $(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
+ -z max-page-size=4096 -z common-page-size=4096
$(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE
$(call if_changed,vdso)
@@ -58,9 +56,7 @@
hostprogs-y += vdso2c
quiet_cmd_vdso2c = VDSO2C $@
-define cmd_vdso2c
- $(obj)/vdso2c $< $(<:%.dbg=%) $@
-endef
+ cmd_vdso2c = $(obj)/vdso2c $< $(<:%.dbg=%) $@
$(obj)/vdso-image-%.c: $(obj)/vdso%.so.dbg $(obj)/vdso%.so $(obj)/vdso2c FORCE
$(call if_changed,vdso2c)
@@ -95,10 +91,8 @@
#
CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
- -Wl,-soname=linux-vdso.so.1 \
- -Wl,-z,max-page-size=4096 \
- -Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
+ -z max-page-size=4096 -z common-page-size=4096
# x32-rebranded versions
vobjx32s-y := $(vobjs-y:.o=-x32.o)
@@ -123,7 +117,7 @@
$(call if_changed,vdso)
CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
targets += vdso32/vdso32.lds
targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
@@ -157,13 +151,13 @@
# The DSO images are built using a special linker script.
#
quiet_cmd_vdso = VDSO $@
- cmd_vdso = $(CC) -nostdlib -o $@ \
+ cmd_vdso = $(LD) -nostdlib -o $@ \
$(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \
- -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
+ -T $(filter %.lds,$^) $(filter %.o,$^) && \
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) \
- $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
+ $(call ld-option, --build-id) -Bsymbolic
GCOV_PROFILE := n
#
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 86f0c15..035c374 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2041,15 +2041,15 @@
cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
cpuc->intel_cp_status &= ~(1ull << hwc->idx);
+ if (unlikely(event->attr.precise_ip))
+ intel_pmu_pebs_disable(event);
+
if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
intel_pmu_disable_fixed(hwc);
return;
}
x86_pmu_disable_event(event);
-
- if (unlikely(event->attr.precise_ip))
- intel_pmu_pebs_disable(event);
}
static void intel_pmu_del_event(struct perf_event *event)
@@ -2068,17 +2068,19 @@
x86_perf_event_update(event);
}
-static void intel_pmu_enable_fixed(struct hw_perf_event *hwc)
+static void intel_pmu_enable_fixed(struct perf_event *event)
{
+ struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
- u64 ctrl_val, bits, mask;
+ u64 ctrl_val, mask, bits = 0;
/*
- * Enable IRQ generation (0x8),
+ * Enable IRQ generation (0x8), if not PEBS,
* and enable ring-3 counting (0x2) and ring-0 counting (0x1)
* if requested:
*/
- bits = 0x8ULL;
+ if (!event->attr.precise_ip)
+ bits |= 0x8;
if (hwc->config & ARCH_PERFMON_EVENTSEL_USR)
bits |= 0x2;
if (hwc->config & ARCH_PERFMON_EVENTSEL_OS)
@@ -2120,14 +2122,14 @@
if (unlikely(event_is_checkpointed(event)))
cpuc->intel_cp_status |= (1ull << hwc->idx);
- if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
- intel_pmu_enable_fixed(hwc);
- return;
- }
-
if (unlikely(event->attr.precise_ip))
intel_pmu_pebs_enable(event);
+ if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
+ intel_pmu_enable_fixed(event);
+ return;
+ }
+
__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
}
@@ -2280,7 +2282,10 @@
* counters from the GLOBAL_STATUS mask and we always process PEBS
* events via drain_pebs().
*/
- status &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
+ if (x86_pmu.flags & PMU_FL_PEBS_ALL)
+ status &= ~cpuc->pebs_enabled;
+ else
+ status &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
/*
* PEBS overflow sets bit 62 in the global status register
@@ -4072,7 +4077,6 @@
intel_pmu_lbr_init_skl();
x86_pmu.event_constraints = intel_slm_event_constraints;
- x86_pmu.pebs_constraints = intel_glp_pebs_event_constraints;
x86_pmu.extra_regs = intel_glm_extra_regs;
/*
* It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS
@@ -4082,6 +4086,7 @@
x86_pmu.pebs_prec_dist = true;
x86_pmu.lbr_pt_coexist = true;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+ x86_pmu.flags |= PMU_FL_PEBS_ALL;
x86_pmu.get_event_constraints = glp_get_event_constraints;
x86_pmu.cpu_events = glm_events_attrs;
/* Goldmont Plus has 4-wide pipeline */
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8dbba77..b7b01d7 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -713,12 +713,6 @@
EVENT_CONSTRAINT_END
};
-struct event_constraint intel_glp_pebs_event_constraints[] = {
- /* Allow all events as PEBS with no flags */
- INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
- EVENT_CONSTRAINT_END
-};
-
struct event_constraint intel_nehalem_pebs_event_constraints[] = {
INTEL_PLD_CONSTRAINT(0x100b, 0xf), /* MEM_INST_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
@@ -871,6 +865,13 @@
}
}
+ /*
+ * Extended PEBS support
+ * Makes the PEBS code search the normal constraints.
+ */
+ if (x86_pmu.flags & PMU_FL_PEBS_ALL)
+ return NULL;
+
return &emptyconstraint;
}
@@ -896,10 +897,16 @@
{
struct debug_store *ds = cpuc->ds;
u64 threshold;
+ int reserved;
+
+ if (x86_pmu.flags & PMU_FL_PEBS_ALL)
+ reserved = x86_pmu.max_pebs_events + x86_pmu.num_counters_fixed;
+ else
+ reserved = x86_pmu.max_pebs_events;
if (cpuc->n_pebs == cpuc->n_large_pebs) {
threshold = ds->pebs_absolute_maximum -
- x86_pmu.max_pebs_events * x86_pmu.pebs_record_size;
+ reserved * x86_pmu.pebs_record_size;
} else {
threshold = ds->pebs_buffer_base + x86_pmu.pebs_record_size;
}
@@ -963,7 +970,11 @@
* This must be done in pmu::start(), because PERF_EVENT_IOC_PERIOD.
*/
if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
- ds->pebs_event_reset[hwc->idx] =
+ unsigned int idx = hwc->idx;
+
+ if (idx >= INTEL_PMC_IDX_FIXED)
+ idx = MAX_PEBS_EVENTS + (idx - INTEL_PMC_IDX_FIXED);
+ ds->pebs_event_reset[idx] =
(u64)(-hwc->sample_period) & x86_pmu.cntval_mask;
} else {
ds->pebs_event_reset[hwc->idx] = 0;
@@ -1481,9 +1492,10 @@
struct debug_store *ds = cpuc->ds;
struct perf_event *event;
void *base, *at, *top;
- short counts[MAX_PEBS_EVENTS] = {};
- short error[MAX_PEBS_EVENTS] = {};
- int bit, i;
+ short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] = {};
+ short error[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] = {};
+ int bit, i, size;
+ u64 mask;
if (!x86_pmu.pebs_active)
return;
@@ -1493,6 +1505,13 @@
ds->pebs_index = ds->pebs_buffer_base;
+ mask = (1ULL << x86_pmu.max_pebs_events) - 1;
+ size = x86_pmu.max_pebs_events;
+ if (x86_pmu.flags & PMU_FL_PEBS_ALL) {
+ mask |= ((1ULL << x86_pmu.num_counters_fixed) - 1) << INTEL_PMC_IDX_FIXED;
+ size = INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed;
+ }
+
if (unlikely(base >= top)) {
/*
* The drain_pebs() could be called twice in a short period
@@ -1502,7 +1521,7 @@
* update the event->count for this case.
*/
for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled,
- x86_pmu.max_pebs_events) {
+ size) {
event = cpuc->events[bit];
if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
intel_pmu_save_and_restart_reload(event, 0);
@@ -1515,12 +1534,12 @@
u64 pebs_status;
pebs_status = p->status & cpuc->pebs_enabled;
- pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
+ pebs_status &= mask;
/* PEBS v3 has more accurate status bits */
if (x86_pmu.intel_cap.pebs_format >= 3) {
for_each_set_bit(bit, (unsigned long *)&pebs_status,
- x86_pmu.max_pebs_events)
+ size)
counts[bit]++;
continue;
@@ -1568,7 +1587,7 @@
counts[bit]++;
}
- for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
+ for (bit = 0; bit < size; bit++) {
if ((counts[bit] == 0) && (error[bit] == 0))
continue;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index cf372b9..f3e006b 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -216,6 +216,8 @@
void intel_pmu_lbr_reset(void)
{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
if (!x86_pmu.lbr_nr)
return;
@@ -223,6 +225,9 @@
intel_pmu_lbr_reset_32();
else
intel_pmu_lbr_reset_64();
+
+ cpuc->last_task_ctx = NULL;
+ cpuc->last_log_id = 0;
}
/*
@@ -334,6 +339,7 @@
static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
int i;
unsigned lbr_idx, mask;
u64 tos;
@@ -344,9 +350,21 @@
return;
}
- mask = x86_pmu.lbr_nr - 1;
tos = task_ctx->tos;
- for (i = 0; i < tos; i++) {
+ /*
+ * Does not restore the LBR registers, if
+ * - No one else touched them, and
+ * - Did not enter C6
+ */
+ if ((task_ctx == cpuc->last_task_ctx) &&
+ (task_ctx->log_id == cpuc->last_log_id) &&
+ rdlbr_from(tos)) {
+ task_ctx->lbr_stack_state = LBR_NONE;
+ return;
+ }
+
+ mask = x86_pmu.lbr_nr - 1;
+ for (i = 0; i < task_ctx->valid_lbrs; i++) {
lbr_idx = (tos - i) & mask;
wrlbr_from(lbr_idx, task_ctx->lbr_from[i]);
wrlbr_to (lbr_idx, task_ctx->lbr_to[i]);
@@ -354,14 +372,24 @@
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
wrmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
}
+
+ for (; i < x86_pmu.lbr_nr; i++) {
+ lbr_idx = (tos - i) & mask;
+ wrlbr_from(lbr_idx, 0);
+ wrlbr_to(lbr_idx, 0);
+ if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
+ wrmsrl(MSR_LBR_INFO_0 + lbr_idx, 0);
+ }
+
wrmsrl(x86_pmu.lbr_tos, tos);
task_ctx->lbr_stack_state = LBR_NONE;
}
static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
unsigned lbr_idx, mask;
- u64 tos;
+ u64 tos, from;
int i;
if (task_ctx->lbr_callstack_users == 0) {
@@ -371,15 +399,22 @@
mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos();
- for (i = 0; i < tos; i++) {
+ for (i = 0; i < x86_pmu.lbr_nr; i++) {
lbr_idx = (tos - i) & mask;
- task_ctx->lbr_from[i] = rdlbr_from(lbr_idx);
+ from = rdlbr_from(lbr_idx);
+ if (!from)
+ break;
+ task_ctx->lbr_from[i] = from;
task_ctx->lbr_to[i] = rdlbr_to(lbr_idx);
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
}
+ task_ctx->valid_lbrs = i;
task_ctx->tos = tos;
task_ctx->lbr_stack_state = LBR_VALID;
+
+ cpuc->last_task_ctx = task_ctx;
+ cpuc->last_log_id = ++task_ctx->log_id;
}
void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
@@ -531,7 +566,7 @@
*/
static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
{
- bool need_info = false;
+ bool need_info = false, call_stack = false;
unsigned long mask = x86_pmu.lbr_nr - 1;
int lbr_format = x86_pmu.intel_cap.lbr_format;
u64 tos = intel_pmu_lbr_tos();
@@ -542,7 +577,7 @@
if (cpuc->lbr_sel) {
need_info = !(cpuc->lbr_sel->config & LBR_NO_INFO);
if (cpuc->lbr_sel->config & LBR_CALL_STACK)
- num = tos;
+ call_stack = true;
}
for (i = 0; i < num; i++) {
@@ -555,6 +590,13 @@
from = rdlbr_from(lbr_idx);
to = rdlbr_to(lbr_idx);
+ /*
+ * Read LBR call stack entries
+ * until invalid entry (0s) is detected.
+ */
+ if (call_stack && !from)
+ break;
+
if (lbr_format == LBR_FORMAT_INFO && need_info) {
u64 info;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 9f37114..1562863 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -163,6 +163,7 @@
unsigned core_id; /* per-core: core id */
};
+struct x86_perf_task_context;
#define MAX_LBR_ENTRIES 32
enum {
@@ -214,6 +215,8 @@
struct perf_branch_entry lbr_entries[MAX_LBR_ENTRIES];
struct er_account *lbr_sel;
u64 br_sel;
+ struct x86_perf_task_context *last_task_ctx;
+ int last_log_id;
/*
* Intel host/guest exclude bits
@@ -648,8 +651,10 @@
u64 lbr_to[MAX_LBR_ENTRIES];
u64 lbr_info[MAX_LBR_ENTRIES];
int tos;
+ int valid_lbrs;
int lbr_callstack_users;
int lbr_stack_state;
+ int log_id;
};
#define x86_add_quirk(func_) \
@@ -668,6 +673,7 @@
#define PMU_FL_HAS_RSP_1 0x2 /* has 2 equivalent offcore_rsp regs */
#define PMU_FL_EXCL_CNTRS 0x4 /* has exclusive counter requirements */
#define PMU_FL_EXCL_ENABLED 0x8 /* exclusive counter active */
+#define PMU_FL_PEBS_ALL 0x10 /* all events are valid PEBS events */
#define EVENT_VAR(_id) event_attr_##_id
#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 4023383..5b0f613 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -31,6 +31,8 @@
#include <asm/mshyperv.h>
#include <asm/apic.h>
+#include <asm/trace/hyperv.h>
+
static struct apic orig_apic;
static u64 hv_apic_icr_read(void)
@@ -99,6 +101,9 @@
int nr_bank = 0;
int ret = 1;
+ if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
+ return false;
+
local_irq_save(flags);
arg = (struct ipi_arg_ex **)this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -130,10 +135,10 @@
static bool __send_ipi_mask(const struct cpumask *mask, int vector)
{
int cur_cpu, vcpu;
- struct ipi_arg_non_ex **arg;
- struct ipi_arg_non_ex *ipi_arg;
+ struct ipi_arg_non_ex ipi_arg;
int ret = 1;
- unsigned long flags;
+
+ trace_hyperv_send_ipi_mask(mask, vector);
if (cpumask_empty(mask))
return true;
@@ -144,40 +149,43 @@
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return false;
- if ((ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
- return __send_ipi_mask_ex(mask, vector);
+ /*
+ * From the supplied CPU set we need to figure out if we can get away
+ * with cheaper HVCALL_SEND_IPI hypercall. This is possible when the
+ * highest VP number in the set is < 64. As VP numbers are usually in
+ * ascending order and match Linux CPU ids, here is an optimization:
+ * we check the VP number for the highest bit in the supplied set first
+ * so we can quickly find out if using HVCALL_SEND_IPI_EX hypercall is
+ * a must. We will also check all VP numbers when walking the supplied
+ * CPU set to remain correct in all cases.
+ */
+ if (hv_cpu_number_to_vp_number(cpumask_last(mask)) >= 64)
+ goto do_ex_hypercall;
- local_irq_save(flags);
- arg = (struct ipi_arg_non_ex **)this_cpu_ptr(hyperv_pcpu_input_arg);
-
- ipi_arg = *arg;
- if (unlikely(!ipi_arg))
- goto ipi_mask_done;
-
- ipi_arg->vector = vector;
- ipi_arg->reserved = 0;
- ipi_arg->cpu_mask = 0;
+ ipi_arg.vector = vector;
+ ipi_arg.cpu_mask = 0;
for_each_cpu(cur_cpu, mask) {
vcpu = hv_cpu_number_to_vp_number(cur_cpu);
if (vcpu == VP_INVAL)
- goto ipi_mask_done;
+ return false;
/*
* This particular version of the IPI hypercall can
* only target upto 64 CPUs.
*/
if (vcpu >= 64)
- goto ipi_mask_done;
+ goto do_ex_hypercall;
- __set_bit(vcpu, (unsigned long *)&ipi_arg->cpu_mask);
+ __set_bit(vcpu, (unsigned long *)&ipi_arg.cpu_mask);
}
- ret = hv_do_hypercall(HVCALL_SEND_IPI, ipi_arg, NULL);
-
-ipi_mask_done:
- local_irq_restore(flags);
+ ret = hv_do_fast_hypercall16(HVCALL_SEND_IPI, ipi_arg.vector,
+ ipi_arg.cpu_mask);
return ((ret == 0) ? true : false);
+
+do_ex_hypercall:
+ return __send_ipi_mask_ex(mask, vector);
}
static bool __send_ipi_one(int cpu, int vector)
@@ -233,10 +241,7 @@
void __init hv_apic_init(void)
{
if (ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) {
- if ((ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
- pr_info("Hyper-V: Using ext hypercalls for IPI\n");
- else
- pr_info("Hyper-V: Using IPI hypercalls\n");
+ pr_info("Hyper-V: Using IPI hypercalls\n");
/*
* Set the IPI entry points.
*/
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index de27615..1147e1f 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -16,6 +16,8 @@
/* Each gva in gva_list encodes up to 4096 pages to flush */
#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
+static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
+ const struct flush_tlb_info *info);
/*
* Fills in gva_list starting from offset. Returns the number of items added.
@@ -93,10 +95,29 @@
if (cpumask_equal(cpus, cpu_present_mask)) {
flush->flags |= HV_FLUSH_ALL_PROCESSORS;
} else {
+ /*
+ * From the supplied CPU set we need to figure out if we can get
+ * away with cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}
+ * hypercalls. This is possible when the highest VP number in
+ * the set is < 64. As VP numbers are usually in ascending order
+ * and match Linux CPU ids, here is an optimization: we check
+ * the VP number for the highest bit in the supplied set first
+ * so we can quickly find out if using *_EX hypercalls is a
+ * must. We will also check all VP numbers when walking the
+ * supplied CPU set to remain correct in all cases.
+ */
+ if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
+ goto do_ex_hypercall;
+
for_each_cpu(cpu, cpus) {
vcpu = hv_cpu_number_to_vp_number(cpu);
- if (vcpu >= 64)
+ if (vcpu == VP_INVAL) {
+ local_irq_restore(flags);
goto do_native;
+ }
+
+ if (vcpu >= 64)
+ goto do_ex_hypercall;
__set_bit(vcpu, (unsigned long *)
&flush->processor_mask);
@@ -123,7 +144,12 @@
status = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST,
gva_n, 0, flush, NULL);
}
+ goto check_status;
+do_ex_hypercall:
+ status = hyperv_flush_tlb_others_ex(cpus, info);
+
+check_status:
local_irq_restore(flags);
if (!(status & HV_HYPERCALL_RESULT_MASK))
@@ -132,35 +158,22 @@
native_flush_tlb_others(cpus, info);
}
-static void hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
- const struct flush_tlb_info *info)
+static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
+ const struct flush_tlb_info *info)
{
int nr_bank = 0, max_gvas, gva_n;
struct hv_tlb_flush_ex **flush_pcpu;
struct hv_tlb_flush_ex *flush;
- u64 status = U64_MAX;
- unsigned long flags;
+ u64 status;
- trace_hyperv_mmu_flush_tlb_others(cpus, info);
-
- if (!hv_hypercall_pg)
- goto do_native;
-
- if (cpumask_empty(cpus))
- return;
-
- local_irq_save(flags);
+ if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
+ return U64_MAX;
flush_pcpu = (struct hv_tlb_flush_ex **)
this_cpu_ptr(hyperv_pcpu_input_arg);
flush = *flush_pcpu;
- if (unlikely(!flush)) {
- local_irq_restore(flags);
- goto do_native;
- }
-
if (info->mm) {
/*
* AddressSpace argument must match the CR3 with PCID bits
@@ -176,15 +189,10 @@
flush->hv_vp_set.valid_bank_mask = 0;
- if (!cpumask_equal(cpus, cpu_present_mask)) {
- flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
- nr_bank = cpumask_to_vpset(&(flush->hv_vp_set), cpus);
- }
-
- if (!nr_bank) {
- flush->hv_vp_set.format = HV_GENERIC_SET_ALL;
- flush->flags |= HV_FLUSH_ALL_PROCESSORS;
- }
+ flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ nr_bank = cpumask_to_vpset(&(flush->hv_vp_set), cpus);
+ if (nr_bank < 0)
+ return U64_MAX;
/*
* We can flush not more than max_gvas with one hypercall. Flush the
@@ -213,12 +221,7 @@
gva_n, nr_bank, flush, NULL);
}
- local_irq_restore(flags);
-
- if (!(status & HV_HYPERCALL_RESULT_MASK))
- return;
-do_native:
- native_flush_tlb_others(cpus, info);
+ return status;
}
void hyperv_setup_mmu_ops(void)
@@ -226,11 +229,6 @@
if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
return;
- if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED)) {
- pr_info("Using hypercall for remote TLB flush\n");
- pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
- } else {
- pr_info("Using ext hypercall for remote TLB flush\n");
- pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others_ex;
- }
+ pr_info("Using hypercall for remote TLB flush\n");
+ pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
}
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 0db6bec..b143717 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -80,6 +80,7 @@
* true if the result is zero, or false for all
* other cases.
*/
+#define arch_atomic_sub_and_test arch_atomic_sub_and_test
static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v)
{
GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", e);
@@ -91,6 +92,7 @@
*
* Atomically increments @v by 1.
*/
+#define arch_atomic_inc arch_atomic_inc
static __always_inline void arch_atomic_inc(atomic_t *v)
{
asm volatile(LOCK_PREFIX "incl %0"
@@ -103,6 +105,7 @@
*
* Atomically decrements @v by 1.
*/
+#define arch_atomic_dec arch_atomic_dec
static __always_inline void arch_atomic_dec(atomic_t *v)
{
asm volatile(LOCK_PREFIX "decl %0"
@@ -117,6 +120,7 @@
* returns true if the result is 0, or false for all other
* cases.
*/
+#define arch_atomic_dec_and_test arch_atomic_dec_and_test
static __always_inline bool arch_atomic_dec_and_test(atomic_t *v)
{
GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", e);
@@ -130,6 +134,7 @@
* and returns true if the result is zero, or false for all
* other cases.
*/
+#define arch_atomic_inc_and_test arch_atomic_inc_and_test
static __always_inline bool arch_atomic_inc_and_test(atomic_t *v)
{
GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", e);
@@ -144,6 +149,7 @@
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
+#define arch_atomic_add_negative arch_atomic_add_negative
static __always_inline bool arch_atomic_add_negative(int i, atomic_t *v)
{
GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", s);
@@ -173,9 +179,6 @@
return arch_atomic_add_return(-i, v);
}
-#define arch_atomic_inc_return(v) (arch_atomic_add_return(1, v))
-#define arch_atomic_dec_return(v) (arch_atomic_sub_return(1, v))
-
static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
{
return xadd(&v->counter, i);
@@ -199,7 +202,7 @@
static inline int arch_atomic_xchg(atomic_t *v, int new)
{
- return xchg(&v->counter, new);
+ return arch_xchg(&v->counter, new);
}
static inline void arch_atomic_and(int i, atomic_t *v)
@@ -253,27 +256,6 @@
return val;
}
-/**
- * __arch_atomic_add_unless - add unless the number is already a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as @v was not already @u.
- * Returns the old value of @v.
- */
-static __always_inline int __arch_atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c = arch_atomic_read(v);
-
- do {
- if (unlikely(c == u))
- break;
- } while (!arch_atomic_try_cmpxchg(v, &c, c + a));
-
- return c;
-}
-
#ifdef CONFIG_X86_32
# include <asm/atomic64_32.h>
#else
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 92212bf..ef959f0 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -158,6 +158,7 @@
"S" (v) : "memory", "ecx");
return a;
}
+#define arch_atomic64_inc_return arch_atomic64_inc_return
static inline long long arch_atomic64_dec_return(atomic64_t *v)
{
@@ -166,6 +167,7 @@
"S" (v) : "memory", "ecx");
return a;
}
+#define arch_atomic64_dec_return arch_atomic64_dec_return
/**
* arch_atomic64_add - add integer to atomic64 variable
@@ -198,25 +200,12 @@
}
/**
- * arch_atomic64_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer to type atomic64_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-static inline int arch_atomic64_sub_and_test(long long i, atomic64_t *v)
-{
- return arch_atomic64_sub_return(i, v) == 0;
-}
-
-/**
* arch_atomic64_inc - increment atomic64 variable
* @v: pointer to type atomic64_t
*
* Atomically increments @v by 1.
*/
+#define arch_atomic64_inc arch_atomic64_inc
static inline void arch_atomic64_inc(atomic64_t *v)
{
__alternative_atomic64(inc, inc_return, /* no output */,
@@ -229,6 +218,7 @@
*
* Atomically decrements @v by 1.
*/
+#define arch_atomic64_dec arch_atomic64_dec
static inline void arch_atomic64_dec(atomic64_t *v)
{
__alternative_atomic64(dec, dec_return, /* no output */,
@@ -236,46 +226,6 @@
}
/**
- * arch_atomic64_dec_and_test - decrement and test
- * @v: pointer to type atomic64_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-static inline int arch_atomic64_dec_and_test(atomic64_t *v)
-{
- return arch_atomic64_dec_return(v) == 0;
-}
-
-/**
- * atomic64_inc_and_test - increment and test
- * @v: pointer to type atomic64_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-static inline int arch_atomic64_inc_and_test(atomic64_t *v)
-{
- return arch_atomic64_inc_return(v) == 0;
-}
-
-/**
- * arch_atomic64_add_negative - add and test if negative
- * @i: integer value to add
- * @v: pointer to type atomic64_t
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
-static inline int arch_atomic64_add_negative(long long i, atomic64_t *v)
-{
- return arch_atomic64_add_return(i, v) < 0;
-}
-
-/**
* arch_atomic64_add_unless - add unless the number is a given value
* @v: pointer of type atomic64_t
* @a: the amount to add to v...
@@ -295,7 +245,7 @@
return (int)a;
}
-
+#define arch_atomic64_inc_not_zero arch_atomic64_inc_not_zero
static inline int arch_atomic64_inc_not_zero(atomic64_t *v)
{
int r;
@@ -304,6 +254,7 @@
return r;
}
+#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
static inline long long arch_atomic64_dec_if_positive(atomic64_t *v)
{
long long r;
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 6106b59..4343d9b 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -71,6 +71,7 @@
* true if the result is zero, or false for all
* other cases.
*/
+#define arch_atomic64_sub_and_test arch_atomic64_sub_and_test
static inline bool arch_atomic64_sub_and_test(long i, atomic64_t *v)
{
GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, "er", i, "%0", e);
@@ -82,6 +83,7 @@
*
* Atomically increments @v by 1.
*/
+#define arch_atomic64_inc arch_atomic64_inc
static __always_inline void arch_atomic64_inc(atomic64_t *v)
{
asm volatile(LOCK_PREFIX "incq %0"
@@ -95,6 +97,7 @@
*
* Atomically decrements @v by 1.
*/
+#define arch_atomic64_dec arch_atomic64_dec
static __always_inline void arch_atomic64_dec(atomic64_t *v)
{
asm volatile(LOCK_PREFIX "decq %0"
@@ -110,6 +113,7 @@
* returns true if the result is 0, or false for all other
* cases.
*/
+#define arch_atomic64_dec_and_test arch_atomic64_dec_and_test
static inline bool arch_atomic64_dec_and_test(atomic64_t *v)
{
GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", e);
@@ -123,6 +127,7 @@
* and returns true if the result is zero, or false for all
* other cases.
*/
+#define arch_atomic64_inc_and_test arch_atomic64_inc_and_test
static inline bool arch_atomic64_inc_and_test(atomic64_t *v)
{
GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", e);
@@ -137,6 +142,7 @@
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
+#define arch_atomic64_add_negative arch_atomic64_add_negative
static inline bool arch_atomic64_add_negative(long i, atomic64_t *v)
{
GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, "er", i, "%0", s);
@@ -169,9 +175,6 @@
return xadd(&v->counter, -i);
}
-#define arch_atomic64_inc_return(v) (arch_atomic64_add_return(1, (v)))
-#define arch_atomic64_dec_return(v) (arch_atomic64_sub_return(1, (v)))
-
static inline long arch_atomic64_cmpxchg(atomic64_t *v, long old, long new)
{
return arch_cmpxchg(&v->counter, old, new);
@@ -185,46 +188,7 @@
static inline long arch_atomic64_xchg(atomic64_t *v, long new)
{
- return xchg(&v->counter, new);
-}
-
-/**
- * arch_atomic64_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
-static inline bool arch_atomic64_add_unless(atomic64_t *v, long a, long u)
-{
- s64 c = arch_atomic64_read(v);
- do {
- if (unlikely(c == u))
- return false;
- } while (!arch_atomic64_try_cmpxchg(v, &c, c + a));
- return true;
-}
-
-#define arch_atomic64_inc_not_zero(v) arch_atomic64_add_unless((v), 1, 0)
-
-/*
- * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic_t
- *
- * The function returns the old value of *v minus 1, even if
- * the atomic variable, v, was not decremented.
- */
-static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
-{
- s64 dec, c = arch_atomic64_read(v);
- do {
- dec = c - 1;
- if (unlikely(dec < 0))
- break;
- } while (!arch_atomic64_try_cmpxchg(v, &c, dec));
- return dec;
+ return arch_xchg(&v->counter, new);
}
static inline void arch_atomic64_and(long i, atomic64_t *v)
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index e3efd8a..a55d79b 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -75,7 +75,7 @@
* use "asm volatile" and "memory" clobbers to prevent gcc from moving
* information around.
*/
-#define xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")
+#define arch_xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")
/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index bfca3b3..072e545 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -10,13 +10,13 @@
#define arch_cmpxchg64(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
- cmpxchg((ptr), (o), (n)); \
+ arch_cmpxchg((ptr), (o), (n)); \
})
#define arch_cmpxchg64_local(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
- cmpxchg_local((ptr), (o), (n)); \
+ arch_cmpxchg_local((ptr), (o), (n)); \
})
#define system_has_cmpxchg_double() boot_cpu_has(X86_FEATURE_CX16)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 5701f5c..b5c60fa 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */
#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
+#define X86_FEATURE_IBRS_ENHANCED ( 7*32+29) /* Enhanced IBRS */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
@@ -229,7 +230,7 @@
#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */
#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */
-
+#define X86_FEATURE_EPT_AD ( 8*32+17) /* Intel Extended Page Table access-dirty bit */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h
index f59c398..a1f0e90 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -49,11 +49,14 @@
return HBP_NUM;
}
+struct perf_event_attr;
struct perf_event;
struct pmu;
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw);
extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h
index cf090e5..7ed08a7 100644
--- a/arch/x86/include/asm/intel-family.h
+++ b/arch/x86/include/asm/intel-family.h
@@ -76,4 +76,17 @@
#define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */
#define INTEL_FAM6_XEON_PHI_KNM 0x85 /* Knights Mill */
+/* Useful macros */
+#define INTEL_CPU_FAM_ANY(_family, _model, _driver_data) \
+{ \
+ .vendor = X86_VENDOR_INTEL, \
+ .family = _family, \
+ .model = _model, \
+ .feature = X86_FEATURE_ANY, \
+ .driver_data = (kernel_ulong_t)&_driver_data \
+}
+
+#define INTEL_CPU_FAM6(_model, _driver_data) \
+ INTEL_CPU_FAM_ANY(6, INTEL_FAM6_##_model, _driver_data)
+
#endif /* _ASM_X86_INTEL_FAMILY_H */
diff --git a/arch/x86/include/asm/intel-mid.h b/arch/x86/include/asm/intel-mid.h
index fe04491..52f815a 100644
--- a/arch/x86/include/asm/intel-mid.h
+++ b/arch/x86/include/asm/intel-mid.h
@@ -80,35 +80,6 @@
extern enum intel_mid_cpu_type __intel_mid_cpu_chip;
-/**
- * struct intel_mid_ops - Interface between intel-mid & sub archs
- * @arch_setup: arch_setup function to re-initialize platform
- * structures (x86_init, x86_platform_init)
- *
- * This structure can be extended if any new interface is required
- * between intel-mid & its sub arch files.
- */
-struct intel_mid_ops {
- void (*arch_setup)(void);
-};
-
-/* Helper API's for INTEL_MID_OPS_INIT */
-#define DECLARE_INTEL_MID_OPS_INIT(cpuname, cpuid) \
- [cpuid] = get_##cpuname##_ops
-
-/* Maximum number of CPU ops */
-#define MAX_CPU_OPS(a) (sizeof(a)/sizeof(void *))
-
-/*
- * For every new cpu addition, a weak get_<cpuname>_ops() function needs be
- * declared in arch/x86/platform/intel_mid/intel_mid_weak_decls.h.
- */
-#define INTEL_MID_OPS_INIT { \
- DECLARE_INTEL_MID_OPS_INIT(penwell, INTEL_MID_CPU_CHIP_PENWELL), \
- DECLARE_INTEL_MID_OPS_INIT(cloverview, INTEL_MID_CPU_CHIP_CLOVERVIEW), \
- DECLARE_INTEL_MID_OPS_INIT(tangier, INTEL_MID_CPU_CHIP_TANGIER) \
-};
-
#ifdef CONFIG_X86_INTEL_MID
static inline enum intel_mid_cpu_type intel_mid_identify_cpu(void)
@@ -136,20 +107,6 @@
extern enum intel_mid_timer_options intel_mid_timer_options;
-/*
- * Penwell uses spread spectrum clock, so the freq number is not exactly
- * the same as reported by MSR based on SDM.
- */
-#define FSB_FREQ_83SKU 83200
-#define FSB_FREQ_100SKU 99840
-#define FSB_FREQ_133SKU 133000
-
-#define FSB_FREQ_167SKU 167000
-#define FSB_FREQ_200SKU 200000
-#define FSB_FREQ_267SKU 267000
-#define FSB_FREQ_333SKU 333000
-#define FSB_FREQ_400SKU 400000
-
/* Bus Select SoC Fuse value */
#define BSEL_SOC_FUSE_MASK 0x7
/* FSB 133MHz */
diff --git a/arch/x86/include/asm/intel_ds.h b/arch/x86/include/asm/intel_ds.h
index 62a9f49..ae26df1 100644
--- a/arch/x86/include/asm/intel_ds.h
+++ b/arch/x86/include/asm/intel_ds.h
@@ -8,6 +8,7 @@
/* The maximal number of PEBS events: */
#define MAX_PEBS_EVENTS 8
+#define MAX_FIXED_PEBS_EVENTS 3
/*
* A debug store configuration.
@@ -23,7 +24,7 @@
u64 pebs_index;
u64 pebs_absolute_maximum;
u64 pebs_interrupt_threshold;
- u64 pebs_event_reset[MAX_PEBS_EVENTS];
+ u64 pebs_event_reset[MAX_PEBS_EVENTS + MAX_FIXED_PEBS_EVENTS];
} __aligned(PAGE_SIZE);
DECLARE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store);
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index c4fc172..c14f2a7 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -13,6 +13,8 @@
* Interrupt control:
*/
+/* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
+extern inline unsigned long native_save_fl(void);
extern inline unsigned long native_save_fl(void)
{
unsigned long flags;
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 367d99c..c8cec1b 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -78,7 +78,7 @@
* boostable = true: This instruction has been boosted: we have
* added a relative jump after the instruction copy in insn,
* so no single-step and fixup are needed (unless there's
- * a post_handler or break_handler).
+ * a post_handler).
*/
bool boostable;
bool if_modifier;
@@ -111,9 +111,6 @@
unsigned long kprobe_status;
unsigned long kprobe_old_flags;
unsigned long kprobe_saved_flags;
- unsigned long *jprobe_saved_sp;
- struct pt_regs jprobe_saved_regs;
- kprobe_opcode_t jprobes_stack[MAX_STACK_SIZE];
struct prev_kprobe prev_kprobe;
};
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
deleted file mode 100644
index 4618526..0000000
--- a/arch/x86/include/asm/kvm_guest.h
+++ /dev/null
@@ -1,7 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_KVM_GUEST_H
-#define _ASM_X86_KVM_GUEST_H
-
-int kvm_setup_vsyscall_timeinfo(void);
-
-#endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 3aea265..4c72363 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -7,7 +7,6 @@
#include <uapi/asm/kvm_para.h>
extern void kvmclock_init(void);
-extern int kvm_register_clock(char *txt);
#ifdef CONFIG_KVM_GUEST
bool kvm_check_and_clear_guest_paused(void);
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index bbc796e..eeeb928 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -71,12 +71,7 @@
static inline void *ldt_slot_va(int slot)
{
-#ifdef CONFIG_X86_64
return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
-#else
- BUG();
- return (void *)fix_to_virt(FIX_HOLE);
-#endif
}
/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 5a7375e..19886fe 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -194,6 +194,40 @@
return hv_status;
}
+/* Fast hypercall with 16 bytes of input */
+static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
+{
+ u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
+
+#ifdef CONFIG_X86_64
+ {
+ __asm__ __volatile__("mov %4, %%r8\n"
+ CALL_NOSPEC
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input1)
+ : "r" (input2),
+ THUNK_TARGET(hv_hypercall_pg)
+ : "cc", "r8", "r9", "r10", "r11");
+ }
+#else
+ {
+ u32 input1_hi = upper_32_bits(input1);
+ u32 input1_lo = lower_32_bits(input1);
+ u32 input2_hi = upper_32_bits(input2);
+ u32 input2_lo = lower_32_bits(input2);
+
+ __asm__ __volatile__ (CALL_NOSPEC
+ : "=A"(hv_status),
+ "+c"(input1_lo), ASM_CALL_CONSTRAINT
+ : "A" (control), "b" (input1_hi),
+ "D"(input2_hi), "S"(input2_lo),
+ THUNK_TARGET(hv_hypercall_pg)
+ : "cc");
+ }
+#endif
+ return hv_status;
+}
+
/*
* Rep hypercalls. Callers of this functions are supposed to ensure that
* rep_count and varhead_size comply with Hyper-V hypercall definition.
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index f6f6c63..fd2a8c1 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,7 +214,7 @@
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
SPECTRE_V2_RETPOLINE_AMD,
- SPECTRE_V2_IBRS,
+ SPECTRE_V2_IBRS_ENHANCED,
};
/* The Speculative Store Bypass disable variants */
diff --git a/arch/x86/include/asm/orc_types.h b/arch/x86/include/asm/orc_types.h
index 9c9dc57..46f516d 100644
--- a/arch/x86/include/asm/orc_types.h
+++ b/arch/x86/include/asm/orc_types.h
@@ -88,6 +88,7 @@
unsigned sp_reg:4;
unsigned bp_reg:4;
unsigned type:2;
+ unsigned end:1;
} __packed;
/*
@@ -101,6 +102,7 @@
s16 sp_offset;
u8 sp_reg;
u8 type;
+ u8 end;
};
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a06b073..e9202a0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -450,9 +450,10 @@
bool __ret; \
typeof(pcp1) __o1 = (o1), __n1 = (n1); \
typeof(pcp2) __o2 = (o2), __n2 = (n2); \
- asm volatile("cmpxchg8b "__percpu_arg(1)"\n\tsetz %0\n\t" \
- : "=a" (__ret), "+m" (pcp1), "+m" (pcp2), "+d" (__o2) \
- : "b" (__n1), "c" (__n2), "a" (__o1)); \
+ asm volatile("cmpxchg8b "__percpu_arg(1) \
+ CC_SET(z) \
+ : CC_OUT(z) (__ret), "+m" (pcp1), "+m" (pcp2), "+a" (__o1), "+d" (__o2) \
+ : "b" (__n1), "c" (__n2)); \
__ret; \
})
diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 685ffe8..c399ea5 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -19,6 +19,9 @@
static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pmd.pud.p4d.pgd = pti_set_user_pgtbl(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd);
+#endif
*pmdp = pmd;
}
@@ -58,6 +61,9 @@
#ifdef CONFIG_SMP
static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pti_set_user_pgtbl(&xp->pud.p4d.pgd, __pgd(0));
+#endif
return __pmd(xchg((pmdval_t *)xp, 0));
}
#else
@@ -67,6 +73,9 @@
#ifdef CONFIG_SMP
static inline pud_t native_pudp_get_and_clear(pud_t *xp)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pti_set_user_pgtbl(&xp->p4d.pgd, __pgd(0));
+#endif
return __pud(xchg((pudval_t *)xp, 0));
}
#else
diff --git a/arch/x86/include/asm/pgtable-2level_types.h b/arch/x86/include/asm/pgtable-2level_types.h
index f982ef8..6deb6cd 100644
--- a/arch/x86/include/asm/pgtable-2level_types.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -35,4 +35,7 @@
#define PTRS_PER_PTE 1024
+/* This covers all VMSPLIT_* and VMSPLIT_*_OPT variants */
+#define PGD_KERNEL_START (CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
+
#endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index f24df59..f2ca313 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -98,6 +98,9 @@
static inline void native_set_pud(pud_t *pudp, pud_t pud)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pud.p4d.pgd = pti_set_user_pgtbl(&pudp->p4d.pgd, pud.p4d.pgd);
+#endif
set_64bit((unsigned long long *)(pudp), native_pud_val(pud));
}
@@ -229,6 +232,10 @@
{
union split_pud res, *orig = (union split_pud *)pudp;
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pti_set_user_pgtbl(&pudp->p4d.pgd, __pgd(0));
+#endif
+
/* xchg acts as a barrier before setting of the high bits */
res.pud_low = xchg(&orig->pud_low, 0);
res.pud_high = orig->pud_high;
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 6a59a6d..858358a 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -21,9 +21,10 @@
#endif /* !__ASSEMBLY__ */
#ifdef CONFIG_PARAVIRT
-#define SHARED_KERNEL_PMD (pv_info.shared_kernel_pmd)
+#define SHARED_KERNEL_PMD ((!static_cpu_has(X86_FEATURE_PTI) && \
+ (pv_info.shared_kernel_pmd)))
#else
-#define SHARED_KERNEL_PMD 1
+#define SHARED_KERNEL_PMD (!static_cpu_has(X86_FEATURE_PTI))
#endif
/*
@@ -45,5 +46,6 @@
#define PTRS_PER_PTE 512
#define MAX_POSSIBLE_PHYSMEM_BITS 36
+#define PGD_KERNEL_START (CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
#endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5715647..a1cb333 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -30,11 +30,14 @@
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
void ptdump_walk_pgd_level_checkwx(void);
+void ptdump_walk_user_pgd_level_checkwx(void);
#ifdef CONFIG_DEBUG_WX
-#define debug_checkwx() ptdump_walk_pgd_level_checkwx()
+#define debug_checkwx() ptdump_walk_pgd_level_checkwx()
+#define debug_checkwx_user() ptdump_walk_user_pgd_level_checkwx()
#else
-#define debug_checkwx() do { } while (0)
+#define debug_checkwx() do { } while (0)
+#define debug_checkwx_user() do { } while (0)
#endif
/*
@@ -640,8 +643,31 @@
pmd_t *populate_extra_pmd(unsigned long vaddr);
pte_t *populate_extra_pte(unsigned long vaddr);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return pgd;
+ return __pti_set_user_pgtbl(pgdp, pgd);
+}
+#else /* CONFIG_PAGE_TABLE_ISOLATION */
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
#endif /* __ASSEMBLY__ */
+
#ifdef CONFIG_X86_32
# include <asm/pgtable_32.h>
#else
@@ -1154,6 +1180,70 @@
}
}
#endif
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+ unsigned long ptr = (unsigned long)__ptr;
+
+ return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < PGD_KERNEL_START);
+}
+
+static inline int pgd_large(pgd_t pgd) { return 0; }
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and
+ * the user one is in the last 4k. To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr |= BIT(bit);
+ return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr &= ~BIT(bit);
+ return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+ return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+ return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+ return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+ return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
/*
* clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 88a056b..b3ec519 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -34,8 +34,6 @@
void paging_init(void);
void sync_initial_page_table(void);
-static inline int pgd_large(pgd_t pgd) { return 0; }
-
/*
* Define this if things work differently on an i386 and an i486:
* it will (on an i486) warn about kernel memory accesses that are
diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index d9a001a..b0bc0ff 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -50,13 +50,18 @@
((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) \
& PMD_MASK)
-#define PKMAP_BASE \
+#define LDT_BASE_ADDR \
((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
+#define LDT_END_ADDR (LDT_BASE_ADDR + PMD_SIZE)
+
+#define PKMAP_BASE \
+ ((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
+
#ifdef CONFIG_HIGHMEM
# define VMALLOC_END (PKMAP_BASE - 2 * PAGE_SIZE)
#else
-# define VMALLOC_END (CPU_ENTRY_AREA_BASE - 2 * PAGE_SIZE)
+# define VMALLOC_END (LDT_BASE_ADDR - 2 * PAGE_SIZE)
#endif
#define MODULES_VADDR VMALLOC_START
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 3c5385f..acb6970 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -132,90 +132,6 @@
#endif
}
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-/*
- * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
- * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and
- * the user one is in the last 4k. To switch between them, you
- * just need to flip the 12th bit in their addresses.
- */
-#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT
-
-/*
- * This generates better code than the inline assembly in
- * __set_bit().
- */
-static inline void *ptr_set_bit(void *ptr, int bit)
-{
- unsigned long __ptr = (unsigned long)ptr;
-
- __ptr |= BIT(bit);
- return (void *)__ptr;
-}
-static inline void *ptr_clear_bit(void *ptr, int bit)
-{
- unsigned long __ptr = (unsigned long)ptr;
-
- __ptr &= ~BIT(bit);
- return (void *)__ptr;
-}
-
-static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
-{
- return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
-{
- return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
-{
- return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
-{
- return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-#endif /* CONFIG_PAGE_TABLE_ISOLATION */
-
-/*
- * Page table pages are page-aligned. The lower half of the top
- * level is used for userspace and the top half for the kernel.
- *
- * Returns true for parts of the PGD that map userspace and
- * false for the parts that map the kernel.
- */
-static inline bool pgdp_maps_userspace(void *__ptr)
-{
- unsigned long ptr = (unsigned long)__ptr;
-
- return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
-}
-
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
-
-/*
- * Take a PGD location (pgdp) and a pgd value that needs to be set there.
- * Populates the user and returns the resulting PGD that must be set in
- * the kernel copy of the page tables.
- */
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
-{
- if (!static_cpu_has(X86_FEATURE_PTI))
- return pgd;
- return __pti_set_user_pgd(pgdp, pgd);
-}
-#else
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
-{
- return pgd;
-}
-#endif
-
static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
{
pgd_t pgd;
@@ -226,7 +142,7 @@
}
pgd = native_make_pgd(native_p4d_val(p4d));
- pgd = pti_set_user_pgd((pgd_t *)p4dp, pgd);
+ pgd = pti_set_user_pgtbl((pgd_t *)p4dp, pgd);
*p4dp = native_make_p4d(native_pgd_val(pgd));
}
@@ -237,7 +153,7 @@
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
- *pgdp = pti_set_user_pgd(pgdp, pgd);
+ *pgdp = pti_set_user_pgtbl(pgdp, pgd);
}
static inline void native_pgd_clear(pgd_t *pgd)
@@ -255,7 +171,6 @@
/*
* Level 4 access.
*/
-static inline int pgd_large(pgd_t pgd) { return 0; }
#define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE)
/* PUD - Level3 access */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 054765a..04edd2d 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -115,6 +115,7 @@
#define LDT_PGD_ENTRY_L5 -112UL
#define LDT_PGD_ENTRY (pgtable_l5_enabled() ? LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4)
#define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define LDT_END_ADDR (LDT_BASE_ADDR + PGDIR_SIZE)
#define __VMALLOC_BASE_L4 0xffffc90000000000UL
#define __VMALLOC_BASE_L5 0xffa0000000000000UL
@@ -153,4 +154,6 @@
#define EARLY_DYNAMIC_PAGE_TABLES 64
+#define PGD_KERNEL_START ((PAGE_SIZE / 2) / sizeof(pgd_t))
+
#endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 99fff85..b64acb0 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -50,6 +50,7 @@
#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
#define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
#define _PAGE_SOFTW2 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
+#define _PAGE_SOFTW3 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW3)
#define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT)
#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
#define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -266,14 +267,37 @@
typedef struct { pgdval_t pgd; } pgd_t;
+#ifdef CONFIG_X86_PAE
+
+/*
+ * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
+ * use it here.
+ */
+
+#define PGD_PAE_PAGE_MASK ((signed long)PAGE_MASK)
+#define PGD_PAE_PHYS_MASK (((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PGD_PAE_PAGE_MASK)
+
+/*
+ * PAE allows Base Address, P, PWT, PCD and AVL bits to be set in PGD entries.
+ * All other bits are Reserved MBZ
+ */
+#define PGD_ALLOWED_BITS (PGD_PAE_PHYS_MASK | _PAGE_PRESENT | \
+ _PAGE_PWT | _PAGE_PCD | \
+ _PAGE_SOFTW1 | _PAGE_SOFTW2 | _PAGE_SOFTW3)
+
+#else
+/* No need to mask any bits for !PAE */
+#define PGD_ALLOWED_BITS (~0ULL)
+#endif
+
static inline pgd_t native_make_pgd(pgdval_t val)
{
- return (pgd_t) { val };
+ return (pgd_t) { val & PGD_ALLOWED_BITS };
}
static inline pgdval_t native_pgd_val(pgd_t pgd)
{
- return pgd.pgd;
+ return pgd.pgd & PGD_ALLOWED_BITS;
}
static inline pgdval_t pgd_flags(pgd_t pgd)
diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 625a52a..02c2cbd 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -39,10 +39,6 @@
#define CR3_PCID_MASK 0xFFFull
#define CR3_NOFLUSH BIT_ULL(63)
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-# define X86_CR3_PTI_PCID_USER_BIT 11
-#endif
-
#else
/*
* CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
@@ -53,4 +49,8 @@
#define CR3_NOFLUSH 0
#endif
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_PCID_USER_BIT 11
+#endif
+
#endif /* _ASM_X86_PROCESSOR_FLAGS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cfd29ee..59663c0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -966,6 +966,7 @@
extern unsigned long arch_align_stack(unsigned long sp);
extern void free_init_pages(char *what, unsigned long begin, unsigned long end);
+extern void free_kernel_image_pages(void *begin, void *end);
void default_idle(void);
#ifdef CONFIG_XEN
diff --git a/arch/x86/include/asm/pti.h b/arch/x86/include/asm/pti.h
index 38a17f1..5df09a0 100644
--- a/arch/x86/include/asm/pti.h
+++ b/arch/x86/include/asm/pti.h
@@ -6,10 +6,9 @@
#ifdef CONFIG_PAGE_TABLE_ISOLATION
extern void pti_init(void);
extern void pti_check_boottime_disable(void);
-extern void pti_clone_kernel_text(void);
+extern void pti_finalize(void);
#else
static inline void pti_check_boottime_disable(void) { }
-static inline void pti_clone_kernel_text(void) { }
#endif
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index 4cf11d8..19b9052 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -5,6 +5,7 @@
* PaX/grsecurity.
*/
#include <linux/refcount.h>
+#include <asm/bug.h>
/*
* This is the first portion of the refcount error handling, which lives in
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 5c019d2..4a911a3 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -7,6 +7,7 @@
extern char __brk_base[], __brk_limit[];
extern struct exception_table_entry __stop___ex_table[];
+extern char __end_rodata_aligned[];
#if defined(CONFIG_X86_64)
extern char __end_rodata_hpage_align[];
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index bd09036..34cffce 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -46,6 +46,7 @@
int set_memory_4k(unsigned long addr, int numpages);
int set_memory_encrypted(unsigned long addr, int numpages);
int set_memory_decrypted(unsigned long addr, int numpages);
+int set_memory_np_noalias(unsigned long addr, int numpages);
int set_memory_array_uc(unsigned long *addr, int addrinarray);
int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index eb5f799..36bd243 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -87,15 +87,25 @@
#endif
/* This is used when switching tasks or entering/exiting vm86 mode. */
-static inline void update_sp0(struct task_struct *task)
+static inline void update_task_stack(struct task_struct *task)
{
- /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */
+ /* sp0 always points to the entry trampoline stack, which is constant: */
#ifdef CONFIG_X86_32
- load_sp0(task->thread.sp0);
+ if (static_cpu_has(X86_FEATURE_XENPV))
+ load_sp0(task->thread.sp0);
+ else
+ this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
#else
+ /*
+ * x86-64 updates x86_tss.sp1 via cpu_current_top_of_stack. That
+ * doesn't work on x86-32 because sp1 and
+ * cpu_current_top_of_stack have different values (because of
+ * the non-zero stack-padding on 32bit).
+ */
if (static_cpu_has(X86_FEATURE_XENPV))
load_sp0(task_top_of_stack(task));
#endif
+
}
#endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 2ecd34e..e85ff65 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -37,5 +37,6 @@
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
+extern int after_bootmem;
#endif /* _ASM_X86_TEXT_PATCHING_H */
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 6690cd3..511bf5f 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -148,22 +148,6 @@
#define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
#endif
-static inline bool tlb_defer_switch_to_init_mm(void)
-{
- /*
- * If we have PCID, then switching to init_mm is reasonably
- * fast. If we don't have PCID, then switching to init_mm is
- * quite slow, so we try to defer it in the hopes that we can
- * avoid it entirely. The latter approach runs the risk of
- * receiving otherwise unnecessary IPIs.
- *
- * This choice is just a heuristic. The tlb code can handle this
- * function returning true or false regardless of whether we have
- * PCID.
- */
- return !static_cpu_has(X86_FEATURE_PCID);
-}
-
struct tlb_context {
u64 ctx_id;
u64 tlb_gen;
@@ -554,4 +538,9 @@
native_flush_tlb_others(mask, info)
#endif
+extern void tlb_flush_remove_tables(struct mm_struct *mm);
+extern void tlb_flush_remove_tables_local(void *arg);
+
+#define HAVE_TLB_FLUSH_REMOVE_TABLES
+
#endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
index 4253bca..9c0d4b5 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -28,6 +28,21 @@
__entry->addr, __entry->end)
);
+TRACE_EVENT(hyperv_send_ipi_mask,
+ TP_PROTO(const struct cpumask *cpus,
+ int vector),
+ TP_ARGS(cpus, vector),
+ TP_STRUCT__entry(
+ __field(unsigned int, ncpus)
+ __field(int, vector)
+ ),
+ TP_fast_assign(__entry->ncpus = cpumask_weight(cpus);
+ __entry->vector = vector;
+ ),
+ TP_printk("ncpus %d vector %x",
+ __entry->ncpus, __entry->vector)
+ );
+
#endif /* CONFIG_HYPERV */
#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 2701d22..eb5bbfe 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -33,13 +33,13 @@
extern struct system_counterval_t convert_art_to_tsc(u64 art);
extern struct system_counterval_t convert_art_ns_to_tsc(u64 art_ns);
-extern void tsc_early_delay_calibrate(void);
+extern void tsc_early_init(void);
extern void tsc_init(void);
extern void mark_tsc_unstable(char *reason);
extern int unsynchronized_tsc(void);
extern int check_tsc_unstable(void);
extern void mark_tsc_async_resets(char *reason);
-extern unsigned long native_calibrate_cpu(void);
+extern unsigned long native_calibrate_cpu_early(void);
extern unsigned long native_calibrate_tsc(void);
extern unsigned long long native_sched_clock_from_tsc(u64 tsc);
diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h
index bae46fc..0bcdb12 100644
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -26,7 +26,7 @@
* the debuginfo as necessary. It will also warn if it sees any
* inconsistencies.
*/
-.macro UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=0 type=ORC_TYPE_CALL
+.macro UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=0 type=ORC_TYPE_CALL end=0
#ifdef CONFIG_STACK_VALIDATION
.Lunwind_hint_ip_\@:
.pushsection .discard.unwind_hints
@@ -35,12 +35,14 @@
.short \sp_offset
.byte \sp_reg
.byte \type
+ .byte \end
+ .balign 4
.popsection
#endif
.endm
.macro UNWIND_HINT_EMPTY
- UNWIND_HINT sp_reg=ORC_REG_UNDEFINED
+ UNWIND_HINT sp_reg=ORC_REG_UNDEFINED end=1
.endm
.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 iret=0
@@ -86,19 +88,21 @@
#else /* !__ASSEMBLY__ */
-#define UNWIND_HINT(sp_reg, sp_offset, type) \
+#define UNWIND_HINT(sp_reg, sp_offset, type, end) \
"987: \n\t" \
".pushsection .discard.unwind_hints\n\t" \
/* struct unwind_hint */ \
".long 987b - .\n\t" \
- ".short " __stringify(sp_offset) "\n\t" \
+ ".short " __stringify(sp_offset) "\n\t" \
".byte " __stringify(sp_reg) "\n\t" \
".byte " __stringify(type) "\n\t" \
+ ".byte " __stringify(end) "\n\t" \
+ ".balign 4 \n\t" \
".popsection\n\t"
-#define UNWIND_HINT_SAVE UNWIND_HINT(0, 0, UNWIND_HINT_TYPE_SAVE)
+#define UNWIND_HINT_SAVE UNWIND_HINT(0, 0, UNWIND_HINT_TYPE_SAVE, 0)
-#define UNWIND_HINT_RESTORE UNWIND_HINT(0, 0, UNWIND_HINT_TYPE_RESTORE)
+#define UNWIND_HINT_RESTORE UNWIND_HINT(0, 0, UNWIND_HINT_TYPE_RESTORE, 0)
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index a481763..014f214 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -668,6 +668,7 @@
local_irq_save(flags);
memcpy(addr, opcode, len);
local_irq_restore(flags);
+ sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
return addr;
@@ -693,6 +694,12 @@
struct page *pages[2];
int i;
+ /*
+ * While boot memory allocator is runnig we cannot use struct
+ * pages as they are not yet initialized.
+ */
+ BUG_ON(!after_bootmem);
+
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index adbda58..07fa222 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -940,7 +940,7 @@
if (levt->features & CLOCK_EVT_FEAT_DUMMY) {
pr_warning("APIC timer disabled due to verification failure\n");
- return -1;
+ return -1;
}
return 0;
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 35aaee4f..0954315 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -218,7 +218,8 @@
return 0;
}
-static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest)
+static int
+assign_vector_locked(struct irq_data *irqd, const struct cpumask *dest)
{
struct apic_chip_data *apicd = apic_chip_data(irqd);
bool resvd = apicd->has_reserved;
@@ -245,22 +246,12 @@
return -EBUSY;
vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
- if (vector > 0)
- apic_update_vector(irqd, vector, cpu);
trace_vector_alloc(irqd->irq, vector, resvd, vector);
- return vector;
-}
-
-static int assign_vector_locked(struct irq_data *irqd,
- const struct cpumask *dest)
-{
- struct apic_chip_data *apicd = apic_chip_data(irqd);
- int vector = allocate_vector(irqd, dest);
-
if (vector < 0)
return vector;
+ apic_update_vector(irqd, vector, cpu);
+ apic_update_irq_cfg(irqd, vector, cpu);
- apic_update_irq_cfg(irqd, apicd->vector, apicd->cpu);
return 0;
}
@@ -433,7 +424,7 @@
pr_err("Managed startup irq %u, no vector available\n",
irqd->irq);
}
- return ret;
+ return ret;
}
static int x86_vector_activate(struct irq_domain *dom, struct irq_data *irqd,
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index d492752..391f358 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -394,10 +394,10 @@
EXPORT_SYMBOL(uv_hub_info_version);
/* Default UV memory block size is 2GB */
-static unsigned long mem_block_size = (2UL << 30);
+static unsigned long mem_block_size __initdata = (2UL << 30);
/* Kernel parameter to specify UV mem block size */
-static int parse_mem_block_size(char *ptr)
+static int __init parse_mem_block_size(char *ptr)
{
unsigned long size = memparse(ptr, NULL);
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index dcb008c..01de31d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,4 +103,9 @@
OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+ DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
+
+ /* Offset for sp0 and sp1 into the tss_struct */
+ OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
+ OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
}
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index a4a3be3..82826f2 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -46,8 +46,14 @@
OFFSET(saved_context_gdt_desc, saved_context, gdt_desc);
BLANK();
- /* Offset from the sysenter stack to tss.sp0 */
- DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+ /*
+ * Offset from the entry stack to task stack stored in TSS. Kernel entry
+ * happens on the per-cpu entry-stack, and the asm code switches to the
+ * task-stack pointer stored in x86_tss.sp1, which is a copy of
+ * task->thread.sp0 where entry code can find it.
+ */
+ DEFINE(TSS_entry2task_stack,
+ offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
offsetofend(struct cpu_entry_area, entry_stack_page.stack));
#ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index b2dcd16..3b9405e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -65,8 +65,6 @@
#undef ENTRY
OFFSET(TSS_ist, tss_struct, x86_tss.ist);
- OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
- OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
BLANK();
#ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 7a40196..347137e 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -35,7 +35,9 @@
obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o
obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o
-obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o intel_rdt_ctrlmondata.o
+obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o
+obj-$(CONFIG_INTEL_RDT) += intel_rdt_ctrlmondata.o intel_rdt_pseudo_lock.o
+CFLAGS_intel_rdt_pseudo_lock.o = -I$(src)
obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 38915fb..b732438 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -232,8 +232,6 @@
}
}
- set_cpu_cap(c, X86_FEATURE_K7);
-
/* calling is from identify_secondary_cpu() ? */
if (!c->cpu_index)
return;
@@ -617,6 +615,14 @@
early_init_amd_mc(c);
+#ifdef CONFIG_X86_32
+ if (c->x86 == 6)
+ set_cpu_cap(c, X86_FEATURE_K7);
+#endif
+
+ if (c->x86 >= 0xf)
+ set_cpu_cap(c, X86_FEATURE_K8);
+
rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
/*
@@ -863,9 +869,6 @@
init_amd_cacheinfo(c);
- if (c->x86 >= 0xf)
- set_cpu_cap(c, X86_FEATURE_K8);
-
if (cpu_has(c, X86_FEATURE_XMM2)) {
unsigned long long val;
int ret;
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5c0ea39..405a9a6 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -130,6 +130,7 @@
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline",
[SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline",
+ [SPECTRE_V2_IBRS_ENHANCED] = "Mitigation: Enhanced IBRS",
};
#undef pr_fmt
@@ -313,23 +314,6 @@
return cmd;
}
-/* Check for Skylake-like CPUs (for RSB handling) */
-static bool __init is_skylake_era(void)
-{
- if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
- boot_cpu_data.x86 == 6) {
- switch (boot_cpu_data.x86_model) {
- case INTEL_FAM6_SKYLAKE_MOBILE:
- case INTEL_FAM6_SKYLAKE_DESKTOP:
- case INTEL_FAM6_SKYLAKE_X:
- case INTEL_FAM6_KABYLAKE_MOBILE:
- case INTEL_FAM6_KABYLAKE_DESKTOP:
- return true;
- }
- }
- return false;
-}
-
static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -349,6 +333,13 @@
case SPECTRE_V2_CMD_FORCE:
case SPECTRE_V2_CMD_AUTO:
+ if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED)) {
+ mode = SPECTRE_V2_IBRS_ENHANCED;
+ /* Force it so VMEXIT will restore correctly */
+ x86_spec_ctrl_base |= SPEC_CTRL_IBRS;
+ wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+ goto specv2_set_mode;
+ }
if (IS_ENABLED(CONFIG_RETPOLINE))
goto retpoline_auto;
break;
@@ -386,26 +377,20 @@
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
}
+specv2_set_mode:
spectre_v2_enabled = mode;
pr_info("%s\n", spectre_v2_strings[mode]);
/*
- * If neither SMEP nor PTI are available, there is a risk of
- * hitting userspace addresses in the RSB after a context switch
- * from a shallow call stack to a deeper one. To prevent this fill
- * the entire RSB, even when using IBRS.
+ * If spectre v2 protection has been enabled, unconditionally fill
+ * RSB during a context switch; this protects against two independent
+ * issues:
*
- * Skylake era CPUs have a separate issue with *underflow* of the
- * RSB, when they will predict 'ret' targets from the generic BTB.
- * The proper mitigation for this is IBRS. If IBRS is not supported
- * or deactivated in favour of retpolines the RSB fill on context
- * switch is required.
+ * - RSB underflow (and switch to BTB) on Skylake+
+ * - SpectreRSB variant of spectre v2 on X86_BUG_SPECTRE_V2 CPUs
*/
- if ((!boot_cpu_has(X86_FEATURE_PTI) &&
- !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
- setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
- pr_info("Spectre v2 mitigation: Filling RSB on context switch\n");
- }
+ setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+ pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
/* Initialize Indirect Branch Prediction Barrier if supported */
if (boot_cpu_has(X86_FEATURE_IBPB)) {
@@ -415,9 +400,16 @@
/*
* Retpoline means the kernel is safe because it has no indirect
- * branches. But firmware isn't, so use IBRS to protect that.
+ * branches. Enhanced IBRS protects firmware too, so, enable restricted
+ * speculation around firmware calls only when Enhanced IBRS isn't
+ * supported.
+ *
+ * Use "mode" to check Enhanced IBRS instead of boot_cpu_has(), because
+ * the user might select retpoline on the kernel command line and if
+ * the CPU supports Enhanced IBRS, kernel might un-intentionally not
+ * enable IBRS around firmware calls.
*/
- if (boot_cpu_has(X86_FEATURE_IBRS)) {
+ if (boot_cpu_has(X86_FEATURE_IBRS) && mode != SPECTRE_V2_IBRS_ENHANCED) {
setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
pr_info("Enabling Restricted Speculation for firmware calls\n");
}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index eb4cb3e..ba6b8bb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1005,6 +1005,9 @@
!cpu_has(c, X86_FEATURE_AMD_SSB_NO))
setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
+ if (ia32_cap & ARCH_CAP_IBRS_ALL)
+ setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
+
if (x86_match_cpu(cpu_no_meltdown))
return;
@@ -1016,6 +1019,24 @@
}
/*
+ * The NOPL instruction is supposed to exist on all CPUs of family >= 6;
+ * unfortunately, that's not true in practice because of early VIA
+ * chips and (more importantly) broken virtualizers that are not easy
+ * to detect. In the latter case it doesn't even *fail* reliably, so
+ * probing for it doesn't even work. Disable it completely on 32-bit
+ * unless we can find a reliable way to detect all the broken cases.
+ * Enable it explicitly on 64-bit for non-constant inputs of cpu_has().
+ */
+static void detect_nopl(void)
+{
+#ifdef CONFIG_X86_32
+ setup_clear_cpu_cap(X86_FEATURE_NOPL);
+#else
+ setup_force_cpu_cap(X86_FEATURE_NOPL);
+#endif
+}
+
+/*
* Do minimum CPU detection early.
* Fields really needed: vendor, cpuid_level, family, model, mask,
* cache alignment.
@@ -1089,6 +1110,8 @@
*/
if (!pgtable_l5_enabled())
setup_clear_cpu_cap(X86_FEATURE_LA57);
+
+ detect_nopl();
}
void __init early_cpu_init(void)
@@ -1124,24 +1147,6 @@
early_identify_cpu(&boot_cpu_data);
}
-/*
- * The NOPL instruction is supposed to exist on all CPUs of family >= 6;
- * unfortunately, that's not true in practice because of early VIA
- * chips and (more importantly) broken virtualizers that are not easy
- * to detect. In the latter case it doesn't even *fail* reliably, so
- * probing for it doesn't even work. Disable it completely on 32-bit
- * unless we can find a reliable way to detect all the broken cases.
- * Enable it explicitly on 64-bit for non-constant inputs of cpu_has().
- */
-static void detect_nopl(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_X86_32
- clear_cpu_cap(c, X86_FEATURE_NOPL);
-#else
- set_cpu_cap(c, X86_FEATURE_NOPL);
-#endif
-}
-
static void detect_null_seg_behavior(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_X86_64
@@ -1204,8 +1209,6 @@
get_model_name(c); /* Default name */
- detect_nopl(c);
-
detect_null_seg_behavior(c);
/*
@@ -1804,11 +1807,12 @@
enter_lazy_tlb(&init_mm, curr);
/*
- * Initialize the TSS. Don't bother initializing sp0, as the initial
- * task never enters user mode.
+ * Initialize the TSS. sp0 points to the entry trampoline stack
+ * regardless of what task is running.
*/
set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
load_TR_desc();
+ load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
load_mm_ldt(&init_mm);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index eb75564..c050cd6 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -465,14 +465,17 @@
#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001
#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002
#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020
+#define x86_VMX_FEATURE_EPT_CAP_AD 0x00200000
u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
+ u32 msr_vpid_cap, msr_ept_cap;
clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW);
clear_cpu_cap(c, X86_FEATURE_VNMI);
clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
clear_cpu_cap(c, X86_FEATURE_EPT);
clear_cpu_cap(c, X86_FEATURE_VPID);
+ clear_cpu_cap(c, X86_FEATURE_EPT_AD);
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
msr_ctl = vmx_msr_high | vmx_msr_low;
@@ -487,8 +490,13 @@
if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) &&
(msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW))
set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
- if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT)
+ if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) {
set_cpu_cap(c, X86_FEATURE_EPT);
+ rdmsr(MSR_IA32_VMX_EPT_VPID_CAP,
+ msr_ept_cap, msr_vpid_cap);
+ if (msr_ept_cap & x86_VMX_FEATURE_EPT_CAP_AD)
+ set_cpu_cap(c, X86_FEATURE_EPT_AD);
+ }
if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
set_cpu_cap(c, X86_FEATURE_VPID);
}
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index ec4754f..abb71ac 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -859,6 +859,8 @@
return (rdt_mon_capable || rdt_alloc_capable);
}
+static enum cpuhp_state rdt_online;
+
static int __init intel_rdt_late_init(void)
{
struct rdt_resource *r;
@@ -880,6 +882,7 @@
cpuhp_remove_state(state);
return ret;
}
+ rdt_online = state;
for_each_alloc_capable_rdt_resource(r)
pr_info("Intel RDT %s allocation detected\n", r->name);
@@ -891,3 +894,11 @@
}
late_initcall(intel_rdt_late_init);
+
+static void __exit intel_rdt_exit(void)
+{
+ cpuhp_remove_state(rdt_online);
+ rdtgroup_exit();
+}
+
+__exitcall(intel_rdt_exit);
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 3975282..4e588f3 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -81,6 +81,34 @@
};
/**
+ * enum rdtgrp_mode - Mode of a RDT resource group
+ * @RDT_MODE_SHAREABLE: This resource group allows sharing of its allocations
+ * @RDT_MODE_EXCLUSIVE: No sharing of this resource group's allocations allowed
+ * @RDT_MODE_PSEUDO_LOCKSETUP: Resource group will be used for Pseudo-Locking
+ * @RDT_MODE_PSEUDO_LOCKED: No sharing of this resource group's allocations
+ * allowed AND the allocations are Cache Pseudo-Locked
+ *
+ * The mode of a resource group enables control over the allowed overlap
+ * between allocations associated with different resource groups (classes
+ * of service). User is able to modify the mode of a resource group by
+ * writing to the "mode" resctrl file associated with the resource group.
+ *
+ * The "shareable", "exclusive", and "pseudo-locksetup" modes are set by
+ * writing the appropriate text to the "mode" file. A resource group enters
+ * "pseudo-locked" mode after the schemata is written while the resource
+ * group is in "pseudo-locksetup" mode.
+ */
+enum rdtgrp_mode {
+ RDT_MODE_SHAREABLE = 0,
+ RDT_MODE_EXCLUSIVE,
+ RDT_MODE_PSEUDO_LOCKSETUP,
+ RDT_MODE_PSEUDO_LOCKED,
+
+ /* Must be last */
+ RDT_NUM_MODES,
+};
+
+/**
* struct mongroup - store mon group's data in resctrl fs.
* @mon_data_kn kernlfs node for the mon_data directory
* @parent: parent rdtgrp
@@ -95,6 +123,43 @@
};
/**
+ * struct pseudo_lock_region - pseudo-lock region information
+ * @r: RDT resource to which this pseudo-locked region
+ * belongs
+ * @d: RDT domain to which this pseudo-locked region
+ * belongs
+ * @cbm: bitmask of the pseudo-locked region
+ * @lock_thread_wq: waitqueue used to wait on the pseudo-locking thread
+ * completion
+ * @thread_done: variable used by waitqueue to test if pseudo-locking
+ * thread completed
+ * @cpu: core associated with the cache on which the setup code
+ * will be run
+ * @line_size: size of the cache lines
+ * @size: size of pseudo-locked region in bytes
+ * @kmem: the kernel memory associated with pseudo-locked region
+ * @minor: minor number of character device associated with this
+ * region
+ * @debugfs_dir: pointer to this region's directory in the debugfs
+ * filesystem
+ * @pm_reqs: Power management QoS requests related to this region
+ */
+struct pseudo_lock_region {
+ struct rdt_resource *r;
+ struct rdt_domain *d;
+ u32 cbm;
+ wait_queue_head_t lock_thread_wq;
+ int thread_done;
+ int cpu;
+ unsigned int line_size;
+ unsigned int size;
+ void *kmem;
+ unsigned int minor;
+ struct dentry *debugfs_dir;
+ struct list_head pm_reqs;
+};
+
+/**
* struct rdtgroup - store rdtgroup's data in resctrl file system.
* @kn: kernfs node
* @rdtgroup_list: linked list for all rdtgroups
@@ -106,16 +171,20 @@
* @type: indicates type of this rdtgroup - either
* monitor only or ctrl_mon group
* @mon: mongroup related data
+ * @mode: mode of resource group
+ * @plr: pseudo-locked region
*/
struct rdtgroup {
- struct kernfs_node *kn;
- struct list_head rdtgroup_list;
- u32 closid;
- struct cpumask cpu_mask;
- int flags;
- atomic_t waitcount;
- enum rdt_group_type type;
- struct mongroup mon;
+ struct kernfs_node *kn;
+ struct list_head rdtgroup_list;
+ u32 closid;
+ struct cpumask cpu_mask;
+ int flags;
+ atomic_t waitcount;
+ enum rdt_group_type type;
+ struct mongroup mon;
+ enum rdtgrp_mode mode;
+ struct pseudo_lock_region *plr;
};
/* rdtgroup.flags */
@@ -148,6 +217,7 @@
extern int max_name_width, max_data_width;
int __init rdtgroup_init(void);
+void __exit rdtgroup_exit(void);
/**
* struct rftype - describe each file in the resctrl file system
@@ -216,22 +286,24 @@
* @mbps_val: When mba_sc is enabled, this holds the bandwidth in MBps
* @new_ctrl: new ctrl value to be loaded
* @have_new_ctrl: did user provide new_ctrl for this domain
+ * @plr: pseudo-locked region (if any) associated with domain
*/
struct rdt_domain {
- struct list_head list;
- int id;
- struct cpumask cpu_mask;
- unsigned long *rmid_busy_llc;
- struct mbm_state *mbm_total;
- struct mbm_state *mbm_local;
- struct delayed_work mbm_over;
- struct delayed_work cqm_limbo;
- int mbm_work_cpu;
- int cqm_work_cpu;
- u32 *ctrl_val;
- u32 *mbps_val;
- u32 new_ctrl;
- bool have_new_ctrl;
+ struct list_head list;
+ int id;
+ struct cpumask cpu_mask;
+ unsigned long *rmid_busy_llc;
+ struct mbm_state *mbm_total;
+ struct mbm_state *mbm_local;
+ struct delayed_work mbm_over;
+ struct delayed_work cqm_limbo;
+ int mbm_work_cpu;
+ int cqm_work_cpu;
+ u32 *ctrl_val;
+ u32 *mbps_val;
+ u32 new_ctrl;
+ bool have_new_ctrl;
+ struct pseudo_lock_region *plr;
};
/**
@@ -351,7 +423,7 @@
struct rdt_cache cache;
struct rdt_membw membw;
const char *format_str;
- int (*parse_ctrlval) (char *buf, struct rdt_resource *r,
+ int (*parse_ctrlval) (void *data, struct rdt_resource *r,
struct rdt_domain *d);
struct list_head evt_list;
int num_rmid;
@@ -359,8 +431,8 @@
unsigned long fflags;
};
-int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d);
-int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d);
+int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d);
+int parse_bw(void *_buf, struct rdt_resource *r, struct rdt_domain *d);
extern struct mutex rdtgroup_mutex;
@@ -368,7 +440,7 @@
extern struct rdtgroup rdtgroup_default;
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
-int __init rdtgroup_init(void);
+extern struct dentry *debugfs_resctrl;
enum {
RDT_RESOURCE_L3,
@@ -439,13 +511,32 @@
void rdt_ctrl_update(void *arg);
struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
void rdtgroup_kn_unlock(struct kernfs_node *kn);
+int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
+int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
+ umode_t mask);
struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
struct list_head **pos);
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
+bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+ u32 _cbm, int closid, bool exclusive);
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
+ u32 cbm);
+enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
+int rdtgroup_tasks_assigned(struct rdtgroup *r);
+int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
+int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm);
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+int rdt_pseudo_lock_init(void);
+void rdt_pseudo_lock_release(void);
+int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
+void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
+int update_domains(struct rdt_resource *r, int closid);
+void closid_free(int closid);
int alloc_rmid(void);
void free_rmid(u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 116d57b..af358ca 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -64,9 +64,10 @@
return true;
}
-int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d)
+int parse_bw(void *_buf, struct rdt_resource *r, struct rdt_domain *d)
{
unsigned long data;
+ char *buf = _buf;
if (d->have_new_ctrl) {
rdt_last_cmd_printf("duplicate domain %d\n", d->id);
@@ -87,7 +88,7 @@
* are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
* Additionally Haswell requires at least two bits set.
*/
-static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
+static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
{
unsigned long first_bit, zero_bit, val;
unsigned int cbm_len = r->cache.cbm_len;
@@ -122,22 +123,64 @@
return true;
}
+struct rdt_cbm_parse_data {
+ struct rdtgroup *rdtgrp;
+ char *buf;
+};
+
/*
* Read one cache bit mask (hex). Check that it is valid for the current
* resource type.
*/
-int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
+int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
{
- unsigned long data;
+ struct rdt_cbm_parse_data *data = _data;
+ struct rdtgroup *rdtgrp = data->rdtgrp;
+ u32 cbm_val;
if (d->have_new_ctrl) {
rdt_last_cmd_printf("duplicate domain %d\n", d->id);
return -EINVAL;
}
- if(!cbm_validate(buf, &data, r))
+ /*
+ * Cannot set up more than one pseudo-locked region in a cache
+ * hierarchy.
+ */
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
+ rdtgroup_pseudo_locked_in_hierarchy(d)) {
+ rdt_last_cmd_printf("pseudo-locked region in hierarchy\n");
return -EINVAL;
- d->new_ctrl = data;
+ }
+
+ if (!cbm_validate(data->buf, &cbm_val, r))
+ return -EINVAL;
+
+ if ((rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
+ rdtgrp->mode == RDT_MODE_SHAREABLE) &&
+ rdtgroup_cbm_overlaps_pseudo_locked(d, cbm_val)) {
+ rdt_last_cmd_printf("CBM overlaps with pseudo-locked region\n");
+ return -EINVAL;
+ }
+
+ /*
+ * The CBM may not overlap with the CBM of another closid if
+ * either is exclusive.
+ */
+ if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, true)) {
+ rdt_last_cmd_printf("overlaps with exclusive group\n");
+ return -EINVAL;
+ }
+
+ if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, false)) {
+ if (rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ rdt_last_cmd_printf("overlaps with other group\n");
+ return -EINVAL;
+ }
+ }
+
+ d->new_ctrl = cbm_val;
d->have_new_ctrl = true;
return 0;
@@ -149,8 +192,10 @@
* separated by ";". The "id" is in decimal, and must match one of
* the "id"s for this resource.
*/
-static int parse_line(char *line, struct rdt_resource *r)
+static int parse_line(char *line, struct rdt_resource *r,
+ struct rdtgroup *rdtgrp)
{
+ struct rdt_cbm_parse_data data;
char *dom = NULL, *id;
struct rdt_domain *d;
unsigned long dom_id;
@@ -167,15 +212,32 @@
dom = strim(dom);
list_for_each_entry(d, &r->domains, list) {
if (d->id == dom_id) {
- if (r->parse_ctrlval(dom, r, d))
+ data.buf = dom;
+ data.rdtgrp = rdtgrp;
+ if (r->parse_ctrlval(&data, r, d))
return -EINVAL;
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ /*
+ * In pseudo-locking setup mode and just
+ * parsed a valid CBM that should be
+ * pseudo-locked. Only one locked region per
+ * resource group and domain so just do
+ * the required initialization for single
+ * region and return.
+ */
+ rdtgrp->plr->r = r;
+ rdtgrp->plr->d = d;
+ rdtgrp->plr->cbm = d->new_ctrl;
+ d->plr = rdtgrp->plr;
+ return 0;
+ }
goto next;
}
}
return -EINVAL;
}
-static int update_domains(struct rdt_resource *r, int closid)
+int update_domains(struct rdt_resource *r, int closid)
{
struct msr_param msr_param;
cpumask_var_t cpu_mask;
@@ -220,13 +282,14 @@
return 0;
}
-static int rdtgroup_parse_resource(char *resname, char *tok, int closid)
+static int rdtgroup_parse_resource(char *resname, char *tok,
+ struct rdtgroup *rdtgrp)
{
struct rdt_resource *r;
for_each_alloc_enabled_rdt_resource(r) {
- if (!strcmp(resname, r->name) && closid < r->num_closid)
- return parse_line(tok, r);
+ if (!strcmp(resname, r->name) && rdtgrp->closid < r->num_closid)
+ return parse_line(tok, r, rdtgrp);
}
rdt_last_cmd_printf("unknown/unsupported resource name '%s'\n", resname);
return -EINVAL;
@@ -239,7 +302,7 @@
struct rdt_domain *dom;
struct rdt_resource *r;
char *tok, *resname;
- int closid, ret = 0;
+ int ret = 0;
/* Valid input requires a trailing newline */
if (nbytes == 0 || buf[nbytes - 1] != '\n')
@@ -253,7 +316,15 @@
}
rdt_last_cmd_clear();
- closid = rdtgrp->closid;
+ /*
+ * No changes to pseudo-locked region allowed. It has to be removed
+ * and re-created instead.
+ */
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("resource group is pseudo-locked\n");
+ goto out;
+ }
for_each_alloc_enabled_rdt_resource(r) {
list_for_each_entry(dom, &r->domains, list)
@@ -272,17 +343,27 @@
ret = -EINVAL;
goto out;
}
- ret = rdtgroup_parse_resource(resname, tok, closid);
+ ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
if (ret)
goto out;
}
for_each_alloc_enabled_rdt_resource(r) {
- ret = update_domains(r, closid);
+ ret = update_domains(r, rdtgrp->closid);
if (ret)
goto out;
}
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ /*
+ * If pseudo-locking fails we keep the resource group in
+ * mode RDT_MODE_PSEUDO_LOCKSETUP with its class of service
+ * active and updated for just the domain the pseudo-locked
+ * region was requested for.
+ */
+ ret = rdtgroup_pseudo_lock_create(rdtgrp);
+ }
+
out:
rdtgroup_kn_unlock(of->kn);
return ret ?: nbytes;
@@ -318,10 +399,18 @@
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (rdtgrp) {
- closid = rdtgrp->closid;
- for_each_alloc_enabled_rdt_resource(r) {
- if (closid < r->num_closid)
- show_doms(s, r, closid);
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ for_each_alloc_enabled_rdt_resource(r)
+ seq_printf(s, "%s:uninitialized\n", r->name);
+ } else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+ seq_printf(s, "%s:%d=%x\n", rdtgrp->plr->r->name,
+ rdtgrp->plr->d->id, rdtgrp->plr->cbm);
+ } else {
+ closid = rdtgrp->closid;
+ for_each_alloc_enabled_rdt_resource(r) {
+ if (closid < r->num_closid)
+ show_doms(s, r, closid);
+ }
}
} else {
ret = -ENOENT;
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
new file mode 100644
index 0000000..40f3903
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -0,0 +1,1522 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Resource Director Technology (RDT)
+ *
+ * Pseudo-locking support built on top of Cache Allocation Technology (CAT)
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Author: Reinette Chatre <reinette.chatre@intel.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/debugfs.h>
+#include <linux/kthread.h>
+#include <linux/mman.h>
+#include <linux/pm_qos.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <asm/cacheflush.h>
+#include <asm/intel-family.h>
+#include <asm/intel_rdt_sched.h>
+#include <asm/perf_event.h>
+
+#include "intel_rdt.h"
+
+#define CREATE_TRACE_POINTS
+#include "intel_rdt_pseudo_lock_event.h"
+
+/*
+ * MSR_MISC_FEATURE_CONTROL register enables the modification of hardware
+ * prefetcher state. Details about this register can be found in the MSR
+ * tables for specific platforms found in Intel's SDM.
+ */
+#define MSR_MISC_FEATURE_CONTROL 0x000001a4
+
+/*
+ * The bits needed to disable hardware prefetching varies based on the
+ * platform. During initialization we will discover which bits to use.
+ */
+static u64 prefetch_disable_bits;
+
+/*
+ * Major number assigned to and shared by all devices exposing
+ * pseudo-locked regions.
+ */
+static unsigned int pseudo_lock_major;
+static unsigned long pseudo_lock_minor_avail = GENMASK(MINORBITS, 0);
+static struct class *pseudo_lock_class;
+
+/**
+ * get_prefetch_disable_bits - prefetch disable bits of supported platforms
+ *
+ * Capture the list of platforms that have been validated to support
+ * pseudo-locking. This includes testing to ensure pseudo-locked regions
+ * with low cache miss rates can be created under variety of load conditions
+ * as well as that these pseudo-locked regions can maintain their low cache
+ * miss rates under variety of load conditions for significant lengths of time.
+ *
+ * After a platform has been validated to support pseudo-locking its
+ * hardware prefetch disable bits are included here as they are documented
+ * in the SDM.
+ *
+ * When adding a platform here also add support for its cache events to
+ * measure_cycles_perf_fn()
+ *
+ * Return:
+ * If platform is supported, the bits to disable hardware prefetchers, 0
+ * if platform is not supported.
+ */
+static u64 get_prefetch_disable_bits(void)
+{
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+ boot_cpu_data.x86 != 6)
+ return 0;
+
+ switch (boot_cpu_data.x86_model) {
+ case INTEL_FAM6_BROADWELL_X:
+ /*
+ * SDM defines bits of MSR_MISC_FEATURE_CONTROL register
+ * as:
+ * 0 L2 Hardware Prefetcher Disable (R/W)
+ * 1 L2 Adjacent Cache Line Prefetcher Disable (R/W)
+ * 2 DCU Hardware Prefetcher Disable (R/W)
+ * 3 DCU IP Prefetcher Disable (R/W)
+ * 63:4 Reserved
+ */
+ return 0xF;
+ case INTEL_FAM6_ATOM_GOLDMONT:
+ case INTEL_FAM6_ATOM_GEMINI_LAKE:
+ /*
+ * SDM defines bits of MSR_MISC_FEATURE_CONTROL register
+ * as:
+ * 0 L2 Hardware Prefetcher Disable (R/W)
+ * 1 Reserved
+ * 2 DCU Hardware Prefetcher Disable (R/W)
+ * 63:3 Reserved
+ */
+ return 0x5;
+ }
+
+ return 0;
+}
+
+/*
+ * Helper to write 64bit value to MSR without tracing. Used when
+ * use of the cache should be restricted and use of registers used
+ * for local variables avoided.
+ */
+static inline void pseudo_wrmsrl_notrace(unsigned int msr, u64 val)
+{
+ __wrmsr(msr, (u32)(val & 0xffffffffULL), (u32)(val >> 32));
+}
+
+/**
+ * pseudo_lock_minor_get - Obtain available minor number
+ * @minor: Pointer to where new minor number will be stored
+ *
+ * A bitmask is used to track available minor numbers. Here the next free
+ * minor number is marked as unavailable and returned.
+ *
+ * Return: 0 on success, <0 on failure.
+ */
+static int pseudo_lock_minor_get(unsigned int *minor)
+{
+ unsigned long first_bit;
+
+ first_bit = find_first_bit(&pseudo_lock_minor_avail, MINORBITS);
+
+ if (first_bit == MINORBITS)
+ return -ENOSPC;
+
+ __clear_bit(first_bit, &pseudo_lock_minor_avail);
+ *minor = first_bit;
+
+ return 0;
+}
+
+/**
+ * pseudo_lock_minor_release - Return minor number to available
+ * @minor: The minor number made available
+ */
+static void pseudo_lock_minor_release(unsigned int minor)
+{
+ __set_bit(minor, &pseudo_lock_minor_avail);
+}
+
+/**
+ * region_find_by_minor - Locate a pseudo-lock region by inode minor number
+ * @minor: The minor number of the device representing pseudo-locked region
+ *
+ * When the character device is accessed we need to determine which
+ * pseudo-locked region it belongs to. This is done by matching the minor
+ * number of the device to the pseudo-locked region it belongs.
+ *
+ * Minor numbers are assigned at the time a pseudo-locked region is associated
+ * with a cache instance.
+ *
+ * Return: On success return pointer to resource group owning the pseudo-locked
+ * region, NULL on failure.
+ */
+static struct rdtgroup *region_find_by_minor(unsigned int minor)
+{
+ struct rdtgroup *rdtgrp, *rdtgrp_match = NULL;
+
+ list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+ if (rdtgrp->plr && rdtgrp->plr->minor == minor) {
+ rdtgrp_match = rdtgrp;
+ break;
+ }
+ }
+ return rdtgrp_match;
+}
+
+/**
+ * pseudo_lock_pm_req - A power management QoS request list entry
+ * @list: Entry within the @pm_reqs list for a pseudo-locked region
+ * @req: PM QoS request
+ */
+struct pseudo_lock_pm_req {
+ struct list_head list;
+ struct dev_pm_qos_request req;
+};
+
+static void pseudo_lock_cstates_relax(struct pseudo_lock_region *plr)
+{
+ struct pseudo_lock_pm_req *pm_req, *next;
+
+ list_for_each_entry_safe(pm_req, next, &plr->pm_reqs, list) {
+ dev_pm_qos_remove_request(&pm_req->req);
+ list_del(&pm_req->list);
+ kfree(pm_req);
+ }
+}
+
+/**
+ * pseudo_lock_cstates_constrain - Restrict cores from entering C6
+ *
+ * To prevent the cache from being affected by power management entering
+ * C6 has to be avoided. This is accomplished by requesting a latency
+ * requirement lower than lowest C6 exit latency of all supported
+ * platforms as found in the cpuidle state tables in the intel_idle driver.
+ * At this time it is possible to do so with a single latency requirement
+ * for all supported platforms.
+ *
+ * Since Goldmont is supported, which is affected by X86_BUG_MONITOR,
+ * the ACPI latencies need to be considered while keeping in mind that C2
+ * may be set to map to deeper sleep states. In this case the latency
+ * requirement needs to prevent entering C2 also.
+ */
+static int pseudo_lock_cstates_constrain(struct pseudo_lock_region *plr)
+{
+ struct pseudo_lock_pm_req *pm_req;
+ int cpu;
+ int ret;
+
+ for_each_cpu(cpu, &plr->d->cpu_mask) {
+ pm_req = kzalloc(sizeof(*pm_req), GFP_KERNEL);
+ if (!pm_req) {
+ rdt_last_cmd_puts("fail allocating mem for PM QoS\n");
+ ret = -ENOMEM;
+ goto out_err;
+ }
+ ret = dev_pm_qos_add_request(get_cpu_device(cpu),
+ &pm_req->req,
+ DEV_PM_QOS_RESUME_LATENCY,
+ 30);
+ if (ret < 0) {
+ rdt_last_cmd_printf("fail to add latency req cpu%d\n",
+ cpu);
+ kfree(pm_req);
+ ret = -1;
+ goto out_err;
+ }
+ list_add(&pm_req->list, &plr->pm_reqs);
+ }
+
+ return 0;
+
+out_err:
+ pseudo_lock_cstates_relax(plr);
+ return ret;
+}
+
+/**
+ * pseudo_lock_region_clear - Reset pseudo-lock region data
+ * @plr: pseudo-lock region
+ *
+ * All content of the pseudo-locked region is reset - any memory allocated
+ * freed.
+ *
+ * Return: void
+ */
+static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
+{
+ plr->size = 0;
+ plr->line_size = 0;
+ kfree(plr->kmem);
+ plr->kmem = NULL;
+ plr->r = NULL;
+ if (plr->d)
+ plr->d->plr = NULL;
+ plr->d = NULL;
+ plr->cbm = 0;
+ plr->debugfs_dir = NULL;
+}
+
+/**
+ * pseudo_lock_region_init - Initialize pseudo-lock region information
+ * @plr: pseudo-lock region
+ *
+ * Called after user provided a schemata to be pseudo-locked. From the
+ * schemata the &struct pseudo_lock_region is on entry already initialized
+ * with the resource, domain, and capacity bitmask. Here the information
+ * required for pseudo-locking is deduced from this data and &struct
+ * pseudo_lock_region initialized further. This information includes:
+ * - size in bytes of the region to be pseudo-locked
+ * - cache line size to know the stride with which data needs to be accessed
+ * to be pseudo-locked
+ * - a cpu associated with the cache instance on which the pseudo-locking
+ * flow can be executed
+ *
+ * Return: 0 on success, <0 on failure. Descriptive error will be written
+ * to last_cmd_status buffer.
+ */
+static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
+{
+ struct cpu_cacheinfo *ci;
+ int ret;
+ int i;
+
+ /* Pick the first cpu we find that is associated with the cache. */
+ plr->cpu = cpumask_first(&plr->d->cpu_mask);
+
+ if (!cpu_online(plr->cpu)) {
+ rdt_last_cmd_printf("cpu %u associated with cache not online\n",
+ plr->cpu);
+ ret = -ENODEV;
+ goto out_region;
+ }
+
+ ci = get_cpu_cacheinfo(plr->cpu);
+
+ plr->size = rdtgroup_cbm_to_size(plr->r, plr->d, plr->cbm);
+
+ for (i = 0; i < ci->num_leaves; i++) {
+ if (ci->info_list[i].level == plr->r->cache_level) {
+ plr->line_size = ci->info_list[i].coherency_line_size;
+ return 0;
+ }
+ }
+
+ ret = -1;
+ rdt_last_cmd_puts("unable to determine cache line size\n");
+out_region:
+ pseudo_lock_region_clear(plr);
+ return ret;
+}
+
+/**
+ * pseudo_lock_init - Initialize a pseudo-lock region
+ * @rdtgrp: resource group to which new pseudo-locked region will belong
+ *
+ * A pseudo-locked region is associated with a resource group. When this
+ * association is created the pseudo-locked region is initialized. The
+ * details of the pseudo-locked region are not known at this time so only
+ * allocation is done and association established.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+static int pseudo_lock_init(struct rdtgroup *rdtgrp)
+{
+ struct pseudo_lock_region *plr;
+
+ plr = kzalloc(sizeof(*plr), GFP_KERNEL);
+ if (!plr)
+ return -ENOMEM;
+
+ init_waitqueue_head(&plr->lock_thread_wq);
+ INIT_LIST_HEAD(&plr->pm_reqs);
+ rdtgrp->plr = plr;
+ return 0;
+}
+
+/**
+ * pseudo_lock_region_alloc - Allocate kernel memory that will be pseudo-locked
+ * @plr: pseudo-lock region
+ *
+ * Initialize the details required to set up the pseudo-locked region and
+ * allocate the contiguous memory that will be pseudo-locked to the cache.
+ *
+ * Return: 0 on success, <0 on failure. Descriptive error will be written
+ * to last_cmd_status buffer.
+ */
+static int pseudo_lock_region_alloc(struct pseudo_lock_region *plr)
+{
+ int ret;
+
+ ret = pseudo_lock_region_init(plr);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * We do not yet support contiguous regions larger than
+ * KMALLOC_MAX_SIZE.
+ */
+ if (plr->size > KMALLOC_MAX_SIZE) {
+ rdt_last_cmd_puts("requested region exceeds maximum size\n");
+ ret = -E2BIG;
+ goto out_region;
+ }
+
+ plr->kmem = kzalloc(plr->size, GFP_KERNEL);
+ if (!plr->kmem) {
+ rdt_last_cmd_puts("unable to allocate memory\n");
+ ret = -ENOMEM;
+ goto out_region;
+ }
+
+ ret = 0;
+ goto out;
+out_region:
+ pseudo_lock_region_clear(plr);
+out:
+ return ret;
+}
+
+/**
+ * pseudo_lock_free - Free a pseudo-locked region
+ * @rdtgrp: resource group to which pseudo-locked region belonged
+ *
+ * The pseudo-locked region's resources have already been released, or not
+ * yet created at this point. Now it can be freed and disassociated from the
+ * resource group.
+ *
+ * Return: void
+ */
+static void pseudo_lock_free(struct rdtgroup *rdtgrp)
+{
+ pseudo_lock_region_clear(rdtgrp->plr);
+ kfree(rdtgrp->plr);
+ rdtgrp->plr = NULL;
+}
+
+/**
+ * pseudo_lock_fn - Load kernel memory into cache
+ * @_rdtgrp: resource group to which pseudo-lock region belongs
+ *
+ * This is the core pseudo-locking flow.
+ *
+ * First we ensure that the kernel memory cannot be found in the cache.
+ * Then, while taking care that there will be as little interference as
+ * possible, the memory to be loaded is accessed while core is running
+ * with class of service set to the bitmask of the pseudo-locked region.
+ * After this is complete no future CAT allocations will be allowed to
+ * overlap with this bitmask.
+ *
+ * Local register variables are utilized to ensure that the memory region
+ * to be locked is the only memory access made during the critical locking
+ * loop.
+ *
+ * Return: 0. Waiter on waitqueue will be woken on completion.
+ */
+static int pseudo_lock_fn(void *_rdtgrp)
+{
+ struct rdtgroup *rdtgrp = _rdtgrp;
+ struct pseudo_lock_region *plr = rdtgrp->plr;
+ u32 rmid_p, closid_p;
+ unsigned long i;
+#ifdef CONFIG_KASAN
+ /*
+ * The registers used for local register variables are also used
+ * when KASAN is active. When KASAN is active we use a regular
+ * variable to ensure we always use a valid pointer, but the cost
+ * is that this variable will enter the cache through evicting the
+ * memory we are trying to lock into the cache. Thus expect lower
+ * pseudo-locking success rate when KASAN is active.
+ */
+ unsigned int line_size;
+ unsigned int size;
+ void *mem_r;
+#else
+ register unsigned int line_size asm("esi");
+ register unsigned int size asm("edi");
+#ifdef CONFIG_X86_64
+ register void *mem_r asm("rbx");
+#else
+ register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+ /*
+ * Make sure none of the allocated memory is cached. If it is we
+ * will get a cache hit in below loop from outside of pseudo-locked
+ * region.
+ * wbinvd (as opposed to clflush/clflushopt) is required to
+ * increase likelihood that allocated cache portion will be filled
+ * with associated memory.
+ */
+ native_wbinvd();
+
+ /*
+ * Always called with interrupts enabled. By disabling interrupts
+ * ensure that we will not be preempted during this critical section.
+ */
+ local_irq_disable();
+
+ /*
+ * Call wrmsr and rdmsr as directly as possible to avoid tracing
+ * clobbering local register variables or affecting cache accesses.
+ *
+ * Disable the hardware prefetcher so that when the end of the memory
+ * being pseudo-locked is reached the hardware will not read beyond
+ * the buffer and evict pseudo-locked memory read earlier from the
+ * cache.
+ */
+ __wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+ closid_p = this_cpu_read(pqr_state.cur_closid);
+ rmid_p = this_cpu_read(pqr_state.cur_rmid);
+ mem_r = plr->kmem;
+ size = plr->size;
+ line_size = plr->line_size;
+ /*
+ * Critical section begin: start by writing the closid associated
+ * with the capacity bitmask of the cache region being
+ * pseudo-locked followed by reading of kernel memory to load it
+ * into the cache.
+ */
+ __wrmsr(IA32_PQR_ASSOC, rmid_p, rdtgrp->closid);
+ /*
+ * Cache was flushed earlier. Now access kernel memory to read it
+ * into cache region associated with just activated plr->closid.
+ * Loop over data twice:
+ * - In first loop the cache region is shared with the page walker
+ * as it populates the paging structure caches (including TLB).
+ * - In the second loop the paging structure caches are used and
+ * cache region is populated with the memory being referenced.
+ */
+ for (i = 0; i < size; i += PAGE_SIZE) {
+ /*
+ * Add a barrier to prevent speculative execution of this
+ * loop reading beyond the end of the buffer.
+ */
+ rmb();
+ asm volatile("mov (%0,%1,1), %%eax\n\t"
+ :
+ : "r" (mem_r), "r" (i)
+ : "%eax", "memory");
+ }
+ for (i = 0; i < size; i += line_size) {
+ /*
+ * Add a barrier to prevent speculative execution of this
+ * loop reading beyond the end of the buffer.
+ */
+ rmb();
+ asm volatile("mov (%0,%1,1), %%eax\n\t"
+ :
+ : "r" (mem_r), "r" (i)
+ : "%eax", "memory");
+ }
+ /*
+ * Critical section end: restore closid with capacity bitmask that
+ * does not overlap with pseudo-locked region.
+ */
+ __wrmsr(IA32_PQR_ASSOC, rmid_p, closid_p);
+
+ /* Re-enable the hardware prefetcher(s) */
+ wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+ local_irq_enable();
+
+ plr->thread_done = 1;
+ wake_up_interruptible(&plr->lock_thread_wq);
+ return 0;
+}
+
+/**
+ * rdtgroup_monitor_in_progress - Test if monitoring in progress
+ * @r: resource group being queried
+ *
+ * Return: 1 if monitor groups have been created for this resource
+ * group, 0 otherwise.
+ */
+static int rdtgroup_monitor_in_progress(struct rdtgroup *rdtgrp)
+{
+ return !list_empty(&rdtgrp->mon.crdtgrp_list);
+}
+
+/**
+ * rdtgroup_locksetup_user_restrict - Restrict user access to group
+ * @rdtgrp: resource group needing access restricted
+ *
+ * A resource group used for cache pseudo-locking cannot have cpus or tasks
+ * assigned to it. This is communicated to the user by restricting access
+ * to all the files that can be used to make such changes.
+ *
+ * Permissions restored with rdtgroup_locksetup_user_restore()
+ *
+ * Return: 0 on success, <0 on failure. If a failure occurs during the
+ * restriction of access an attempt will be made to restore permissions but
+ * the state of the mode of these files will be uncertain when a failure
+ * occurs.
+ */
+static int rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
+{
+ int ret;
+
+ ret = rdtgroup_kn_mode_restrict(rdtgrp, "tasks");
+ if (ret)
+ return ret;
+
+ ret = rdtgroup_kn_mode_restrict(rdtgrp, "cpus");
+ if (ret)
+ goto err_tasks;
+
+ ret = rdtgroup_kn_mode_restrict(rdtgrp, "cpus_list");
+ if (ret)
+ goto err_cpus;
+
+ if (rdt_mon_capable) {
+ ret = rdtgroup_kn_mode_restrict(rdtgrp, "mon_groups");
+ if (ret)
+ goto err_cpus_list;
+ }
+
+ ret = 0;
+ goto out;
+
+err_cpus_list:
+ rdtgroup_kn_mode_restore(rdtgrp, "cpus_list", 0777);
+err_cpus:
+ rdtgroup_kn_mode_restore(rdtgrp, "cpus", 0777);
+err_tasks:
+ rdtgroup_kn_mode_restore(rdtgrp, "tasks", 0777);
+out:
+ return ret;
+}
+
+/**
+ * rdtgroup_locksetup_user_restore - Restore user access to group
+ * @rdtgrp: resource group needing access restored
+ *
+ * Restore all file access previously removed using
+ * rdtgroup_locksetup_user_restrict()
+ *
+ * Return: 0 on success, <0 on failure. If a failure occurs during the
+ * restoration of access an attempt will be made to restrict permissions
+ * again but the state of the mode of these files will be uncertain when
+ * a failure occurs.
+ */
+static int rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
+{
+ int ret;
+
+ ret = rdtgroup_kn_mode_restore(rdtgrp, "tasks", 0777);
+ if (ret)
+ return ret;
+
+ ret = rdtgroup_kn_mode_restore(rdtgrp, "cpus", 0777);
+ if (ret)
+ goto err_tasks;
+
+ ret = rdtgroup_kn_mode_restore(rdtgrp, "cpus_list", 0777);
+ if (ret)
+ goto err_cpus;
+
+ if (rdt_mon_capable) {
+ ret = rdtgroup_kn_mode_restore(rdtgrp, "mon_groups", 0777);
+ if (ret)
+ goto err_cpus_list;
+ }
+
+ ret = 0;
+ goto out;
+
+err_cpus_list:
+ rdtgroup_kn_mode_restrict(rdtgrp, "cpus_list");
+err_cpus:
+ rdtgroup_kn_mode_restrict(rdtgrp, "cpus");
+err_tasks:
+ rdtgroup_kn_mode_restrict(rdtgrp, "tasks");
+out:
+ return ret;
+}
+
+/**
+ * rdtgroup_locksetup_enter - Resource group enters locksetup mode
+ * @rdtgrp: resource group requested to enter locksetup mode
+ *
+ * A resource group enters locksetup mode to reflect that it would be used
+ * to represent a pseudo-locked region and is in the process of being set
+ * up to do so. A resource group used for a pseudo-locked region would
+ * lose the closid associated with it so we cannot allow it to have any
+ * tasks or cpus assigned nor permit tasks or cpus to be assigned in the
+ * future. Monitoring of a pseudo-locked region is not allowed either.
+ *
+ * The above and more restrictions on a pseudo-locked region are checked
+ * for and enforced before the resource group enters the locksetup mode.
+ *
+ * Returns: 0 if the resource group successfully entered locksetup mode, <0
+ * on failure. On failure the last_cmd_status buffer is updated with text to
+ * communicate details of failure to the user.
+ */
+int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
+{
+ int ret;
+
+ /*
+ * The default resource group can neither be removed nor lose the
+ * default closid associated with it.
+ */
+ if (rdtgrp == &rdtgroup_default) {
+ rdt_last_cmd_puts("cannot pseudo-lock default group\n");
+ return -EINVAL;
+ }
+
+ /*
+ * Cache Pseudo-locking not supported when CDP is enabled.
+ *
+ * Some things to consider if you would like to enable this
+ * support (using L3 CDP as example):
+ * - When CDP is enabled two separate resources are exposed,
+ * L3DATA and L3CODE, but they are actually on the same cache.
+ * The implication for pseudo-locking is that if a
+ * pseudo-locked region is created on a domain of one
+ * resource (eg. L3CODE), then a pseudo-locked region cannot
+ * be created on that same domain of the other resource
+ * (eg. L3DATA). This is because the creation of a
+ * pseudo-locked region involves a call to wbinvd that will
+ * affect all cache allocations on particular domain.
+ * - Considering the previous, it may be possible to only
+ * expose one of the CDP resources to pseudo-locking and
+ * hide the other. For example, we could consider to only
+ * expose L3DATA and since the L3 cache is unified it is
+ * still possible to place instructions there are execute it.
+ * - If only one region is exposed to pseudo-locking we should
+ * still keep in mind that availability of a portion of cache
+ * for pseudo-locking should take into account both resources.
+ * Similarly, if a pseudo-locked region is created in one
+ * resource, the portion of cache used by it should be made
+ * unavailable to all future allocations from both resources.
+ */
+ if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled ||
+ rdt_resources_all[RDT_RESOURCE_L2DATA].alloc_enabled) {
+ rdt_last_cmd_puts("CDP enabled\n");
+ return -EINVAL;
+ }
+
+ /*
+ * Not knowing the bits to disable prefetching implies that this
+ * platform does not support Cache Pseudo-Locking.
+ */
+ prefetch_disable_bits = get_prefetch_disable_bits();
+ if (prefetch_disable_bits == 0) {
+ rdt_last_cmd_puts("pseudo-locking not supported\n");
+ return -EINVAL;
+ }
+
+ if (rdtgroup_monitor_in_progress(rdtgrp)) {
+ rdt_last_cmd_puts("monitoring in progress\n");
+ return -EINVAL;
+ }
+
+ if (rdtgroup_tasks_assigned(rdtgrp)) {
+ rdt_last_cmd_puts("tasks assigned to resource group\n");
+ return -EINVAL;
+ }
+
+ if (!cpumask_empty(&rdtgrp->cpu_mask)) {
+ rdt_last_cmd_puts("CPUs assigned to resource group\n");
+ return -EINVAL;
+ }
+
+ if (rdtgroup_locksetup_user_restrict(rdtgrp)) {
+ rdt_last_cmd_puts("unable to modify resctrl permissions\n");
+ return -EIO;
+ }
+
+ ret = pseudo_lock_init(rdtgrp);
+ if (ret) {
+ rdt_last_cmd_puts("unable to init pseudo-lock region\n");
+ goto out_release;
+ }
+
+ /*
+ * If this system is capable of monitoring a rmid would have been
+ * allocated when the control group was created. This is not needed
+ * anymore when this group would be used for pseudo-locking. This
+ * is safe to call on platforms not capable of monitoring.
+ */
+ free_rmid(rdtgrp->mon.rmid);
+
+ ret = 0;
+ goto out;
+
+out_release:
+ rdtgroup_locksetup_user_restore(rdtgrp);
+out:
+ return ret;
+}
+
+/**
+ * rdtgroup_locksetup_exit - resource group exist locksetup mode
+ * @rdtgrp: resource group
+ *
+ * When a resource group exits locksetup mode the earlier restrictions are
+ * lifted.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
+{
+ int ret;
+
+ if (rdt_mon_capable) {
+ ret = alloc_rmid();
+ if (ret < 0) {
+ rdt_last_cmd_puts("out of RMIDs\n");
+ return ret;
+ }
+ rdtgrp->mon.rmid = ret;
+ }
+
+ ret = rdtgroup_locksetup_user_restore(rdtgrp);
+ if (ret) {
+ free_rmid(rdtgrp->mon.rmid);
+ return ret;
+ }
+
+ pseudo_lock_free(rdtgrp);
+ return 0;
+}
+
+/**
+ * rdtgroup_cbm_overlaps_pseudo_locked - Test if CBM or portion is pseudo-locked
+ * @d: RDT domain
+ * @_cbm: CBM to test
+ *
+ * @d represents a cache instance and @_cbm a capacity bitmask that is
+ * considered for it. Determine if @_cbm overlaps with any existing
+ * pseudo-locked region on @d.
+ *
+ * Return: true if @_cbm overlaps with pseudo-locked region on @d, false
+ * otherwise.
+ */
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm)
+{
+ unsigned long *cbm = (unsigned long *)&_cbm;
+ unsigned long *cbm_b;
+ unsigned int cbm_len;
+
+ if (d->plr) {
+ cbm_len = d->plr->r->cache.cbm_len;
+ cbm_b = (unsigned long *)&d->plr->cbm;
+ if (bitmap_intersects(cbm, cbm_b, cbm_len))
+ return true;
+ }
+ return false;
+}
+
+/**
+ * rdtgroup_pseudo_locked_in_hierarchy - Pseudo-locked region in cache hierarchy
+ * @d: RDT domain under test
+ *
+ * The setup of a pseudo-locked region affects all cache instances within
+ * the hierarchy of the region. It is thus essential to know if any
+ * pseudo-locked regions exist within a cache hierarchy to prevent any
+ * attempts to create new pseudo-locked regions in the same hierarchy.
+ *
+ * Return: true if a pseudo-locked region exists in the hierarchy of @d or
+ * if it is not possible to test due to memory allocation issue,
+ * false otherwise.
+ */
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
+{
+ cpumask_var_t cpu_with_psl;
+ struct rdt_resource *r;
+ struct rdt_domain *d_i;
+ bool ret = false;
+
+ if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
+ return true;
+
+ /*
+ * First determine which cpus have pseudo-locked regions
+ * associated with them.
+ */
+ for_each_alloc_enabled_rdt_resource(r) {
+ list_for_each_entry(d_i, &r->domains, list) {
+ if (d_i->plr)
+ cpumask_or(cpu_with_psl, cpu_with_psl,
+ &d_i->cpu_mask);
+ }
+ }
+
+ /*
+ * Next test if new pseudo-locked region would intersect with
+ * existing region.
+ */
+ if (cpumask_intersects(&d->cpu_mask, cpu_with_psl))
+ ret = true;
+
+ free_cpumask_var(cpu_with_psl);
+ return ret;
+}
+
+/**
+ * measure_cycles_lat_fn - Measure cycle latency to read pseudo-locked memory
+ * @_plr: pseudo-lock region to measure
+ *
+ * There is no deterministic way to test if a memory region is cached. One
+ * way is to measure how long it takes to read the memory, the speed of
+ * access is a good way to learn how close to the cpu the data was. Even
+ * more, if the prefetcher is disabled and the memory is read at a stride
+ * of half the cache line, then a cache miss will be easy to spot since the
+ * read of the first half would be significantly slower than the read of
+ * the second half.
+ *
+ * Return: 0. Waiter on waitqueue will be woken on completion.
+ */
+static int measure_cycles_lat_fn(void *_plr)
+{
+ struct pseudo_lock_region *plr = _plr;
+ unsigned long i;
+ u64 start, end;
+#ifdef CONFIG_KASAN
+ /*
+ * The registers used for local register variables are also used
+ * when KASAN is active. When KASAN is active we use a regular
+ * variable to ensure we always use a valid pointer to access memory.
+ * The cost is that accessing this pointer, which could be in
+ * cache, will be included in the measurement of memory read latency.
+ */
+ void *mem_r;
+#else
+#ifdef CONFIG_X86_64
+ register void *mem_r asm("rbx");
+#else
+ register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+ local_irq_disable();
+ /*
+ * The wrmsr call may be reordered with the assignment below it.
+ * Call wrmsr as directly as possible to avoid tracing clobbering
+ * local register variable used for memory pointer.
+ */
+ __wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+ mem_r = plr->kmem;
+ /*
+ * Dummy execute of the time measurement to load the needed
+ * instructions into the L1 instruction cache.
+ */
+ start = rdtsc_ordered();
+ for (i = 0; i < plr->size; i += 32) {
+ start = rdtsc_ordered();
+ asm volatile("mov (%0,%1,1), %%eax\n\t"
+ :
+ : "r" (mem_r), "r" (i)
+ : "%eax", "memory");
+ end = rdtsc_ordered();
+ trace_pseudo_lock_mem_latency((u32)(end - start));
+ }
+ wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+ local_irq_enable();
+ plr->thread_done = 1;
+ wake_up_interruptible(&plr->lock_thread_wq);
+ return 0;
+}
+
+static int measure_cycles_perf_fn(void *_plr)
+{
+ unsigned long long l3_hits = 0, l3_miss = 0;
+ u64 l3_hit_bits = 0, l3_miss_bits = 0;
+ struct pseudo_lock_region *plr = _plr;
+ unsigned long long l2_hits, l2_miss;
+ u64 l2_hit_bits, l2_miss_bits;
+ unsigned long i;
+#ifdef CONFIG_KASAN
+ /*
+ * The registers used for local register variables are also used
+ * when KASAN is active. When KASAN is active we use regular variables
+ * at the cost of including cache access latency to these variables
+ * in the measurements.
+ */
+ unsigned int line_size;
+ unsigned int size;
+ void *mem_r;
+#else
+ register unsigned int line_size asm("esi");
+ register unsigned int size asm("edi");
+#ifdef CONFIG_X86_64
+ register void *mem_r asm("rbx");
+#else
+ register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+ /*
+ * Non-architectural event for the Goldmont Microarchitecture
+ * from Intel x86 Architecture Software Developer Manual (SDM):
+ * MEM_LOAD_UOPS_RETIRED D1H (event number)
+ * Umask values:
+ * L1_HIT 01H
+ * L2_HIT 02H
+ * L1_MISS 08H
+ * L2_MISS 10H
+ *
+ * On Broadwell Microarchitecture the MEM_LOAD_UOPS_RETIRED event
+ * has two "no fix" errata associated with it: BDM35 and BDM100. On
+ * this platform we use the following events instead:
+ * L2_RQSTS 24H (Documented in https://download.01.org/perfmon/BDW/)
+ * REFERENCES FFH
+ * MISS 3FH
+ * LONGEST_LAT_CACHE 2EH (Documented in SDM)
+ * REFERENCE 4FH
+ * MISS 41H
+ */
+
+ /*
+ * Start by setting flags for IA32_PERFEVTSELx:
+ * OS (Operating system mode) 0x2
+ * INT (APIC interrupt enable) 0x10
+ * EN (Enable counter) 0x40
+ *
+ * Then add the Umask value and event number to select performance
+ * event.
+ */
+
+ switch (boot_cpu_data.x86_model) {
+ case INTEL_FAM6_ATOM_GOLDMONT:
+ case INTEL_FAM6_ATOM_GEMINI_LAKE:
+ l2_hit_bits = (0x52ULL << 16) | (0x2 << 8) | 0xd1;
+ l2_miss_bits = (0x52ULL << 16) | (0x10 << 8) | 0xd1;
+ break;
+ case INTEL_FAM6_BROADWELL_X:
+ /* On BDW the l2_hit_bits count references, not hits */
+ l2_hit_bits = (0x52ULL << 16) | (0xff << 8) | 0x24;
+ l2_miss_bits = (0x52ULL << 16) | (0x3f << 8) | 0x24;
+ /* On BDW the l3_hit_bits count references, not hits */
+ l3_hit_bits = (0x52ULL << 16) | (0x4f << 8) | 0x2e;
+ l3_miss_bits = (0x52ULL << 16) | (0x41 << 8) | 0x2e;
+ break;
+ default:
+ goto out;
+ }
+
+ local_irq_disable();
+ /*
+ * Call wrmsr direcly to avoid the local register variables from
+ * being overwritten due to reordering of their assignment with
+ * the wrmsr calls.
+ */
+ __wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+ /* Disable events and reset counters */
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, 0x0);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x0);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0, 0x0);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 1, 0x0);
+ if (l3_hit_bits > 0) {
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x0);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3, 0x0);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 2, 0x0);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 3, 0x0);
+ }
+ /* Set and enable the L2 counters */
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, l2_hit_bits);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, l2_miss_bits);
+ if (l3_hit_bits > 0) {
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2,
+ l3_hit_bits);
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+ l3_miss_bits);
+ }
+ mem_r = plr->kmem;
+ size = plr->size;
+ line_size = plr->line_size;
+ for (i = 0; i < size; i += line_size) {
+ asm volatile("mov (%0,%1,1), %%eax\n\t"
+ :
+ : "r" (mem_r), "r" (i)
+ : "%eax", "memory");
+ }
+ /*
+ * Call wrmsr directly (no tracing) to not influence
+ * the cache access counters as they are disabled.
+ */
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0,
+ l2_hit_bits & ~(0x40ULL << 16));
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1,
+ l2_miss_bits & ~(0x40ULL << 16));
+ if (l3_hit_bits > 0) {
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2,
+ l3_hit_bits & ~(0x40ULL << 16));
+ pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+ l3_miss_bits & ~(0x40ULL << 16));
+ }
+ l2_hits = native_read_pmc(0);
+ l2_miss = native_read_pmc(1);
+ if (l3_hit_bits > 0) {
+ l3_hits = native_read_pmc(2);
+ l3_miss = native_read_pmc(3);
+ }
+ wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+ local_irq_enable();
+ /*
+ * On BDW we count references and misses, need to adjust. Sometimes
+ * the "hits" counter is a bit more than the references, for
+ * example, x references but x + 1 hits. To not report invalid
+ * hit values in this case we treat that as misses eaqual to
+ * references.
+ */
+ if (boot_cpu_data.x86_model == INTEL_FAM6_BROADWELL_X)
+ l2_hits -= (l2_miss > l2_hits ? l2_hits : l2_miss);
+ trace_pseudo_lock_l2(l2_hits, l2_miss);
+ if (l3_hit_bits > 0) {
+ if (boot_cpu_data.x86_model == INTEL_FAM6_BROADWELL_X)
+ l3_hits -= (l3_miss > l3_hits ? l3_hits : l3_miss);
+ trace_pseudo_lock_l3(l3_hits, l3_miss);
+ }
+
+out:
+ plr->thread_done = 1;
+ wake_up_interruptible(&plr->lock_thread_wq);
+ return 0;
+}
+
+/**
+ * pseudo_lock_measure_cycles - Trigger latency measure to pseudo-locked region
+ *
+ * The measurement of latency to access a pseudo-locked region should be
+ * done from a cpu that is associated with that pseudo-locked region.
+ * Determine which cpu is associated with this region and start a thread on
+ * that cpu to perform the measurement, wait for that thread to complete.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp, int sel)
+{
+ struct pseudo_lock_region *plr = rdtgrp->plr;
+ struct task_struct *thread;
+ unsigned int cpu;
+ int ret = -1;
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ if (rdtgrp->flags & RDT_DELETED) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ plr->thread_done = 0;
+ cpu = cpumask_first(&plr->d->cpu_mask);
+ if (!cpu_online(cpu)) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ if (sel == 1)
+ thread = kthread_create_on_node(measure_cycles_lat_fn, plr,
+ cpu_to_node(cpu),
+ "pseudo_lock_measure/%u",
+ cpu);
+ else if (sel == 2)
+ thread = kthread_create_on_node(measure_cycles_perf_fn, plr,
+ cpu_to_node(cpu),
+ "pseudo_lock_measure/%u",
+ cpu);
+ else
+ goto out;
+
+ if (IS_ERR(thread)) {
+ ret = PTR_ERR(thread);
+ goto out;
+ }
+ kthread_bind(thread, cpu);
+ wake_up_process(thread);
+
+ ret = wait_event_interruptible(plr->lock_thread_wq,
+ plr->thread_done == 1);
+ if (ret < 0)
+ goto out;
+
+ ret = 0;
+
+out:
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+ return ret;
+}
+
+static ssize_t pseudo_lock_measure_trigger(struct file *file,
+ const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct rdtgroup *rdtgrp = file->private_data;
+ size_t buf_size;
+ char buf[32];
+ int ret;
+ int sel;
+
+ buf_size = min(count, (sizeof(buf) - 1));
+ if (copy_from_user(buf, user_buf, buf_size))
+ return -EFAULT;
+
+ buf[buf_size] = '\0';
+ ret = kstrtoint(buf, 10, &sel);
+ if (ret == 0) {
+ if (sel != 1)
+ return -EINVAL;
+ ret = debugfs_file_get(file->f_path.dentry);
+ if (ret)
+ return ret;
+ ret = pseudo_lock_measure_cycles(rdtgrp, sel);
+ if (ret == 0)
+ ret = count;
+ debugfs_file_put(file->f_path.dentry);
+ }
+
+ return ret;
+}
+
+static const struct file_operations pseudo_measure_fops = {
+ .write = pseudo_lock_measure_trigger,
+ .open = simple_open,
+ .llseek = default_llseek,
+};
+
+/**
+ * rdtgroup_pseudo_lock_create - Create a pseudo-locked region
+ * @rdtgrp: resource group to which pseudo-lock region belongs
+ *
+ * Called when a resource group in the pseudo-locksetup mode receives a
+ * valid schemata that should be pseudo-locked. Since the resource group is
+ * in pseudo-locksetup mode the &struct pseudo_lock_region has already been
+ * allocated and initialized with the essential information. If a failure
+ * occurs the resource group remains in the pseudo-locksetup mode with the
+ * &struct pseudo_lock_region associated with it, but cleared from all
+ * information and ready for the user to re-attempt pseudo-locking by
+ * writing the schemata again.
+ *
+ * Return: 0 if the pseudo-locked region was successfully pseudo-locked, <0
+ * on failure. Descriptive error will be written to last_cmd_status buffer.
+ */
+int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
+{
+ struct pseudo_lock_region *plr = rdtgrp->plr;
+ struct task_struct *thread;
+ unsigned int new_minor;
+ struct device *dev;
+ int ret;
+
+ ret = pseudo_lock_region_alloc(plr);
+ if (ret < 0)
+ return ret;
+
+ ret = pseudo_lock_cstates_constrain(plr);
+ if (ret < 0) {
+ ret = -EINVAL;
+ goto out_region;
+ }
+
+ plr->thread_done = 0;
+
+ thread = kthread_create_on_node(pseudo_lock_fn, rdtgrp,
+ cpu_to_node(plr->cpu),
+ "pseudo_lock/%u", plr->cpu);
+ if (IS_ERR(thread)) {
+ ret = PTR_ERR(thread);
+ rdt_last_cmd_printf("locking thread returned error %d\n", ret);
+ goto out_cstates;
+ }
+
+ kthread_bind(thread, plr->cpu);
+ wake_up_process(thread);
+
+ ret = wait_event_interruptible(plr->lock_thread_wq,
+ plr->thread_done == 1);
+ if (ret < 0) {
+ /*
+ * If the thread does not get on the CPU for whatever
+ * reason and the process which sets up the region is
+ * interrupted then this will leave the thread in runnable
+ * state and once it gets on the CPU it will derefence
+ * the cleared, but not freed, plr struct resulting in an
+ * empty pseudo-locking loop.
+ */
+ rdt_last_cmd_puts("locking thread interrupted\n");
+ goto out_cstates;
+ }
+
+ ret = pseudo_lock_minor_get(&new_minor);
+ if (ret < 0) {
+ rdt_last_cmd_puts("unable to obtain a new minor number\n");
+ goto out_cstates;
+ }
+
+ /*
+ * Unlock access but do not release the reference. The
+ * pseudo-locked region will still be here on return.
+ *
+ * The mutex has to be released temporarily to avoid a potential
+ * deadlock with the mm->mmap_sem semaphore which is obtained in
+ * the device_create() and debugfs_create_dir() callpath below
+ * as well as before the mmap() callback is called.
+ */
+ mutex_unlock(&rdtgroup_mutex);
+
+ if (!IS_ERR_OR_NULL(debugfs_resctrl)) {
+ plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
+ debugfs_resctrl);
+ if (!IS_ERR_OR_NULL(plr->debugfs_dir))
+ debugfs_create_file("pseudo_lock_measure", 0200,
+ plr->debugfs_dir, rdtgrp,
+ &pseudo_measure_fops);
+ }
+
+ dev = device_create(pseudo_lock_class, NULL,
+ MKDEV(pseudo_lock_major, new_minor),
+ rdtgrp, "%s", rdtgrp->kn->name);
+
+ mutex_lock(&rdtgroup_mutex);
+
+ if (IS_ERR(dev)) {
+ ret = PTR_ERR(dev);
+ rdt_last_cmd_printf("failed to create character device: %d\n",
+ ret);
+ goto out_debugfs;
+ }
+
+ /* We released the mutex - check if group was removed while we did so */
+ if (rdtgrp->flags & RDT_DELETED) {
+ ret = -ENODEV;
+ goto out_device;
+ }
+
+ plr->minor = new_minor;
+
+ rdtgrp->mode = RDT_MODE_PSEUDO_LOCKED;
+ closid_free(rdtgrp->closid);
+ rdtgroup_kn_mode_restore(rdtgrp, "cpus", 0444);
+ rdtgroup_kn_mode_restore(rdtgrp, "cpus_list", 0444);
+
+ ret = 0;
+ goto out;
+
+out_device:
+ device_destroy(pseudo_lock_class, MKDEV(pseudo_lock_major, new_minor));
+out_debugfs:
+ debugfs_remove_recursive(plr->debugfs_dir);
+ pseudo_lock_minor_release(new_minor);
+out_cstates:
+ pseudo_lock_cstates_relax(plr);
+out_region:
+ pseudo_lock_region_clear(plr);
+out:
+ return ret;
+}
+
+/**
+ * rdtgroup_pseudo_lock_remove - Remove a pseudo-locked region
+ * @rdtgrp: resource group to which the pseudo-locked region belongs
+ *
+ * The removal of a pseudo-locked region can be initiated when the resource
+ * group is removed from user space via a "rmdir" from userspace or the
+ * unmount of the resctrl filesystem. On removal the resource group does
+ * not go back to pseudo-locksetup mode before it is removed, instead it is
+ * removed directly. There is thus assymmetry with the creation where the
+ * &struct pseudo_lock_region is removed here while it was not created in
+ * rdtgroup_pseudo_lock_create().
+ *
+ * Return: void
+ */
+void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp)
+{
+ struct pseudo_lock_region *plr = rdtgrp->plr;
+
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ /*
+ * Default group cannot be a pseudo-locked region so we can
+ * free closid here.
+ */
+ closid_free(rdtgrp->closid);
+ goto free;
+ }
+
+ pseudo_lock_cstates_relax(plr);
+ debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
+ device_destroy(pseudo_lock_class, MKDEV(pseudo_lock_major, plr->minor));
+ pseudo_lock_minor_release(plr->minor);
+
+free:
+ pseudo_lock_free(rdtgrp);
+}
+
+static int pseudo_lock_dev_open(struct inode *inode, struct file *filp)
+{
+ struct rdtgroup *rdtgrp;
+
+ mutex_lock(&rdtgroup_mutex);
+
+ rdtgrp = region_find_by_minor(iminor(inode));
+ if (!rdtgrp) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -ENODEV;
+ }
+
+ filp->private_data = rdtgrp;
+ atomic_inc(&rdtgrp->waitcount);
+ /* Perform a non-seekable open - llseek is not supported */
+ filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
+static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
+{
+ struct rdtgroup *rdtgrp;
+
+ mutex_lock(&rdtgroup_mutex);
+ rdtgrp = filp->private_data;
+ WARN_ON(!rdtgrp);
+ if (!rdtgrp) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -ENODEV;
+ }
+ filp->private_data = NULL;
+ atomic_dec(&rdtgrp->waitcount);
+ mutex_unlock(&rdtgroup_mutex);
+ return 0;
+}
+
+static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
+{
+ /* Not supported */
+ return -EINVAL;
+}
+
+static const struct vm_operations_struct pseudo_mmap_ops = {
+ .mremap = pseudo_lock_dev_mremap,
+};
+
+static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+ unsigned long vsize = vma->vm_end - vma->vm_start;
+ unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+ struct pseudo_lock_region *plr;
+ struct rdtgroup *rdtgrp;
+ unsigned long physical;
+ unsigned long psize;
+
+ mutex_lock(&rdtgroup_mutex);
+
+ rdtgrp = filp->private_data;
+ WARN_ON(!rdtgrp);
+ if (!rdtgrp) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -ENODEV;
+ }
+
+ plr = rdtgrp->plr;
+
+ /*
+ * Task is required to run with affinity to the cpus associated
+ * with the pseudo-locked region. If this is not the case the task
+ * may be scheduled elsewhere and invalidate entries in the
+ * pseudo-locked region.
+ */
+ if (!cpumask_subset(¤t->cpus_allowed, &plr->d->cpu_mask)) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -EINVAL;
+ }
+
+ physical = __pa(plr->kmem) >> PAGE_SHIFT;
+ psize = plr->size - off;
+
+ if (off > plr->size) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -ENOSPC;
+ }
+
+ /*
+ * Ensure changes are carried directly to the memory being mapped,
+ * do not allow copy-on-write mapping.
+ */
+ if (!(vma->vm_flags & VM_SHARED)) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -EINVAL;
+ }
+
+ if (vsize > psize) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -ENOSPC;
+ }
+
+ memset(plr->kmem + off, 0, vsize);
+
+ if (remap_pfn_range(vma, vma->vm_start, physical + vma->vm_pgoff,
+ vsize, vma->vm_page_prot)) {
+ mutex_unlock(&rdtgroup_mutex);
+ return -EAGAIN;
+ }
+ vma->vm_ops = &pseudo_mmap_ops;
+ mutex_unlock(&rdtgroup_mutex);
+ return 0;
+}
+
+static const struct file_operations pseudo_lock_dev_fops = {
+ .owner = THIS_MODULE,
+ .llseek = no_llseek,
+ .read = NULL,
+ .write = NULL,
+ .open = pseudo_lock_dev_open,
+ .release = pseudo_lock_dev_release,
+ .mmap = pseudo_lock_dev_mmap,
+};
+
+static char *pseudo_lock_devnode(struct device *dev, umode_t *mode)
+{
+ struct rdtgroup *rdtgrp;
+
+ rdtgrp = dev_get_drvdata(dev);
+ if (mode)
+ *mode = 0600;
+ return kasprintf(GFP_KERNEL, "pseudo_lock/%s", rdtgrp->kn->name);
+}
+
+int rdt_pseudo_lock_init(void)
+{
+ int ret;
+
+ ret = register_chrdev(0, "pseudo_lock", &pseudo_lock_dev_fops);
+ if (ret < 0)
+ return ret;
+
+ pseudo_lock_major = ret;
+
+ pseudo_lock_class = class_create(THIS_MODULE, "pseudo_lock");
+ if (IS_ERR(pseudo_lock_class)) {
+ ret = PTR_ERR(pseudo_lock_class);
+ unregister_chrdev(pseudo_lock_major, "pseudo_lock");
+ return ret;
+ }
+
+ pseudo_lock_class->devnode = pseudo_lock_devnode;
+ return 0;
+}
+
+void rdt_pseudo_lock_release(void)
+{
+ class_destroy(pseudo_lock_class);
+ pseudo_lock_class = NULL;
+ unregister_chrdev(pseudo_lock_major, "pseudo_lock");
+ pseudo_lock_major = 0;
+}
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
new file mode 100644
index 0000000..2c041e6
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM resctrl
+
+#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PSEUDO_LOCK_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(pseudo_lock_mem_latency,
+ TP_PROTO(u32 latency),
+ TP_ARGS(latency),
+ TP_STRUCT__entry(__field(u32, latency)),
+ TP_fast_assign(__entry->latency = latency),
+ TP_printk("latency=%u", __entry->latency)
+ );
+
+TRACE_EVENT(pseudo_lock_l2,
+ TP_PROTO(u64 l2_hits, u64 l2_miss),
+ TP_ARGS(l2_hits, l2_miss),
+ TP_STRUCT__entry(__field(u64, l2_hits)
+ __field(u64, l2_miss)),
+ TP_fast_assign(__entry->l2_hits = l2_hits;
+ __entry->l2_miss = l2_miss;),
+ TP_printk("hits=%llu miss=%llu",
+ __entry->l2_hits, __entry->l2_miss));
+
+TRACE_EVENT(pseudo_lock_l3,
+ TP_PROTO(u64 l3_hits, u64 l3_miss),
+ TP_ARGS(l3_hits, l3_miss),
+ TP_STRUCT__entry(__field(u64, l3_hits)
+ __field(u64, l3_miss)),
+ TP_fast_assign(__entry->l3_hits = l3_hits;
+ __entry->l3_miss = l3_miss;),
+ TP_printk("hits=%llu miss=%llu",
+ __entry->l3_hits, __entry->l3_miss));
+
+#endif /* _TRACE_PSEUDO_LOCK_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE intel_rdt_pseudo_lock_event
+#include <trace/define_trace.h>
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 749856a..d6d7ea7 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -20,7 +20,9 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/cacheinfo.h>
#include <linux/cpu.h>
+#include <linux/debugfs.h>
#include <linux/fs.h>
#include <linux/sysfs.h>
#include <linux/kernfs.h>
@@ -55,6 +57,8 @@
static struct seq_buf last_cmd_status;
static char last_cmd_status_buf[512];
+struct dentry *debugfs_resctrl;
+
void rdt_last_cmd_clear(void)
{
lockdep_assert_held(&rdtgroup_mutex);
@@ -121,11 +125,65 @@
return closid;
}
-static void closid_free(int closid)
+void closid_free(int closid)
{
closid_free_map |= 1 << closid;
}
+/**
+ * closid_allocated - test if provided closid is in use
+ * @closid: closid to be tested
+ *
+ * Return: true if @closid is currently associated with a resource group,
+ * false if @closid is free
+ */
+static bool closid_allocated(unsigned int closid)
+{
+ return (closid_free_map & (1 << closid)) == 0;
+}
+
+/**
+ * rdtgroup_mode_by_closid - Return mode of resource group with closid
+ * @closid: closid if the resource group
+ *
+ * Each resource group is associated with a @closid. Here the mode
+ * of a resource group can be queried by searching for it using its closid.
+ *
+ * Return: mode as &enum rdtgrp_mode of resource group with closid @closid
+ */
+enum rdtgrp_mode rdtgroup_mode_by_closid(int closid)
+{
+ struct rdtgroup *rdtgrp;
+
+ list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+ if (rdtgrp->closid == closid)
+ return rdtgrp->mode;
+ }
+
+ return RDT_NUM_MODES;
+}
+
+static const char * const rdt_mode_str[] = {
+ [RDT_MODE_SHAREABLE] = "shareable",
+ [RDT_MODE_EXCLUSIVE] = "exclusive",
+ [RDT_MODE_PSEUDO_LOCKSETUP] = "pseudo-locksetup",
+ [RDT_MODE_PSEUDO_LOCKED] = "pseudo-locked",
+};
+
+/**
+ * rdtgroup_mode_str - Return the string representation of mode
+ * @mode: the resource group mode as &enum rdtgroup_mode
+ *
+ * Return: string representation of valid mode, "unknown" otherwise
+ */
+static const char *rdtgroup_mode_str(enum rdtgrp_mode mode)
+{
+ if (mode < RDT_MODE_SHAREABLE || mode >= RDT_NUM_MODES)
+ return "unknown";
+
+ return rdt_mode_str[mode];
+}
+
/* set uid and gid of rdtgroup dirs and files to that of the creator */
static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
{
@@ -207,8 +265,12 @@
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (rdtgrp) {
- seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
- cpumask_pr_args(&rdtgrp->cpu_mask));
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+ seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
+ cpumask_pr_args(&rdtgrp->plr->d->cpu_mask));
+ else
+ seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
+ cpumask_pr_args(&rdtgrp->cpu_mask));
} else {
ret = -ENOENT;
}
@@ -394,6 +456,13 @@
goto unlock;
}
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("pseudo-locking in progress\n");
+ goto unlock;
+ }
+
if (is_cpu_list(of))
ret = cpulist_parse(buf, newmask);
else
@@ -509,6 +578,32 @@
return ret;
}
+/**
+ * rdtgroup_tasks_assigned - Test if tasks have been assigned to resource group
+ * @r: Resource group
+ *
+ * Return: 1 if tasks have been assigned to @r, 0 otherwise
+ */
+int rdtgroup_tasks_assigned(struct rdtgroup *r)
+{
+ struct task_struct *p, *t;
+ int ret = 0;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ rcu_read_lock();
+ for_each_process_thread(p, t) {
+ if ((r->type == RDTCTRL_GROUP && t->closid == r->closid) ||
+ (r->type == RDTMON_GROUP && t->rmid == r->mon.rmid)) {
+ ret = 1;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
static int rdtgroup_task_write_permission(struct task_struct *task,
struct kernfs_open_file *of)
{
@@ -570,13 +665,22 @@
if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
return -EINVAL;
rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(of->kn);
+ return -ENOENT;
+ }
rdt_last_cmd_clear();
- if (rdtgrp)
- ret = rdtgroup_move_task(pid, rdtgrp, of);
- else
- ret = -ENOENT;
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("pseudo-locking in progress\n");
+ goto unlock;
+ }
+ ret = rdtgroup_move_task(pid, rdtgrp, of);
+
+unlock:
rdtgroup_kn_unlock(of->kn);
return ret ?: nbytes;
@@ -662,6 +766,94 @@
return 0;
}
+/**
+ * rdt_bit_usage_show - Display current usage of resources
+ *
+ * A domain is a shared resource that can now be allocated differently. Here
+ * we display the current regions of the domain as an annotated bitmask.
+ * For each domain of this resource its allocation bitmask
+ * is annotated as below to indicate the current usage of the corresponding bit:
+ * 0 - currently unused
+ * X - currently available for sharing and used by software and hardware
+ * H - currently used by hardware only but available for software use
+ * S - currently used and shareable by software only
+ * E - currently used exclusively by one resource group
+ * P - currently pseudo-locked by one resource group
+ */
+static int rdt_bit_usage_show(struct kernfs_open_file *of,
+ struct seq_file *seq, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ u32 sw_shareable = 0, hw_shareable = 0;
+ u32 exclusive = 0, pseudo_locked = 0;
+ struct rdt_domain *dom;
+ int i, hwb, swb, excl, psl;
+ enum rdtgrp_mode mode;
+ bool sep = false;
+ u32 *ctrl;
+
+ mutex_lock(&rdtgroup_mutex);
+ hw_shareable = r->cache.shareable_bits;
+ list_for_each_entry(dom, &r->domains, list) {
+ if (sep)
+ seq_putc(seq, ';');
+ ctrl = dom->ctrl_val;
+ sw_shareable = 0;
+ exclusive = 0;
+ seq_printf(seq, "%d=", dom->id);
+ for (i = 0; i < r->num_closid; i++, ctrl++) {
+ if (!closid_allocated(i))
+ continue;
+ mode = rdtgroup_mode_by_closid(i);
+ switch (mode) {
+ case RDT_MODE_SHAREABLE:
+ sw_shareable |= *ctrl;
+ break;
+ case RDT_MODE_EXCLUSIVE:
+ exclusive |= *ctrl;
+ break;
+ case RDT_MODE_PSEUDO_LOCKSETUP:
+ /*
+ * RDT_MODE_PSEUDO_LOCKSETUP is possible
+ * here but not included since the CBM
+ * associated with this CLOSID in this mode
+ * is not initialized and no task or cpu can be
+ * assigned this CLOSID.
+ */
+ break;
+ case RDT_MODE_PSEUDO_LOCKED:
+ case RDT_NUM_MODES:
+ WARN(1,
+ "invalid mode for closid %d\n", i);
+ break;
+ }
+ }
+ for (i = r->cache.cbm_len - 1; i >= 0; i--) {
+ pseudo_locked = dom->plr ? dom->plr->cbm : 0;
+ hwb = test_bit(i, (unsigned long *)&hw_shareable);
+ swb = test_bit(i, (unsigned long *)&sw_shareable);
+ excl = test_bit(i, (unsigned long *)&exclusive);
+ psl = test_bit(i, (unsigned long *)&pseudo_locked);
+ if (hwb && swb)
+ seq_putc(seq, 'X');
+ else if (hwb && !swb)
+ seq_putc(seq, 'H');
+ else if (!hwb && swb)
+ seq_putc(seq, 'S');
+ else if (excl)
+ seq_putc(seq, 'E');
+ else if (psl)
+ seq_putc(seq, 'P');
+ else /* Unused bits remain */
+ seq_putc(seq, '0');
+ }
+ sep = true;
+ }
+ seq_putc(seq, '\n');
+ mutex_unlock(&rdtgroup_mutex);
+ return 0;
+}
+
static int rdt_min_bw_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
{
@@ -740,6 +932,269 @@
return nbytes;
}
+/*
+ * rdtgroup_mode_show - Display mode of this resource group
+ */
+static int rdtgroup_mode_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdtgroup *rdtgrp;
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(of->kn);
+ return -ENOENT;
+ }
+
+ seq_printf(s, "%s\n", rdtgroup_mode_str(rdtgrp->mode));
+
+ rdtgroup_kn_unlock(of->kn);
+ return 0;
+}
+
+/**
+ * rdtgroup_cbm_overlaps - Does CBM for intended closid overlap with other
+ * @r: Resource to which domain instance @d belongs.
+ * @d: The domain instance for which @closid is being tested.
+ * @cbm: Capacity bitmask being tested.
+ * @closid: Intended closid for @cbm.
+ * @exclusive: Only check if overlaps with exclusive resource groups
+ *
+ * Checks if provided @cbm intended to be used for @closid on domain
+ * @d overlaps with any other closids or other hardware usage associated
+ * with this domain. If @exclusive is true then only overlaps with
+ * resource groups in exclusive mode will be considered. If @exclusive
+ * is false then overlaps with any resource group or hardware entities
+ * will be considered.
+ *
+ * Return: false if CBM does not overlap, true if it does.
+ */
+bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+ u32 _cbm, int closid, bool exclusive)
+{
+ unsigned long *cbm = (unsigned long *)&_cbm;
+ unsigned long *ctrl_b;
+ enum rdtgrp_mode mode;
+ u32 *ctrl;
+ int i;
+
+ /* Check for any overlap with regions used by hardware directly */
+ if (!exclusive) {
+ if (bitmap_intersects(cbm,
+ (unsigned long *)&r->cache.shareable_bits,
+ r->cache.cbm_len))
+ return true;
+ }
+
+ /* Check for overlap with other resource groups */
+ ctrl = d->ctrl_val;
+ for (i = 0; i < r->num_closid; i++, ctrl++) {
+ ctrl_b = (unsigned long *)ctrl;
+ mode = rdtgroup_mode_by_closid(i);
+ if (closid_allocated(i) && i != closid &&
+ mode != RDT_MODE_PSEUDO_LOCKSETUP) {
+ if (bitmap_intersects(cbm, ctrl_b, r->cache.cbm_len)) {
+ if (exclusive) {
+ if (mode == RDT_MODE_EXCLUSIVE)
+ return true;
+ continue;
+ }
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
+/**
+ * rdtgroup_mode_test_exclusive - Test if this resource group can be exclusive
+ *
+ * An exclusive resource group implies that there should be no sharing of
+ * its allocated resources. At the time this group is considered to be
+ * exclusive this test can determine if its current schemata supports this
+ * setting by testing for overlap with all other resource groups.
+ *
+ * Return: true if resource group can be exclusive, false if there is overlap
+ * with allocations of other resource groups and thus this resource group
+ * cannot be exclusive.
+ */
+static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
+{
+ int closid = rdtgrp->closid;
+ struct rdt_resource *r;
+ struct rdt_domain *d;
+
+ for_each_alloc_enabled_rdt_resource(r) {
+ list_for_each_entry(d, &r->domains, list) {
+ if (rdtgroup_cbm_overlaps(r, d, d->ctrl_val[closid],
+ rdtgrp->closid, false))
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/**
+ * rdtgroup_mode_write - Modify the resource group's mode
+ *
+ */
+static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdtgroup *rdtgrp;
+ enum rdtgrp_mode mode;
+ int ret = 0;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+ buf[nbytes - 1] = '\0';
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(of->kn);
+ return -ENOENT;
+ }
+
+ rdt_last_cmd_clear();
+
+ mode = rdtgrp->mode;
+
+ if ((!strcmp(buf, "shareable") && mode == RDT_MODE_SHAREABLE) ||
+ (!strcmp(buf, "exclusive") && mode == RDT_MODE_EXCLUSIVE) ||
+ (!strcmp(buf, "pseudo-locksetup") &&
+ mode == RDT_MODE_PSEUDO_LOCKSETUP) ||
+ (!strcmp(buf, "pseudo-locked") && mode == RDT_MODE_PSEUDO_LOCKED))
+ goto out;
+
+ if (mode == RDT_MODE_PSEUDO_LOCKED) {
+ rdt_last_cmd_printf("cannot change pseudo-locked group\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (!strcmp(buf, "shareable")) {
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ ret = rdtgroup_locksetup_exit(rdtgrp);
+ if (ret)
+ goto out;
+ }
+ rdtgrp->mode = RDT_MODE_SHAREABLE;
+ } else if (!strcmp(buf, "exclusive")) {
+ if (!rdtgroup_mode_test_exclusive(rdtgrp)) {
+ rdt_last_cmd_printf("schemata overlaps\n");
+ ret = -EINVAL;
+ goto out;
+ }
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ ret = rdtgroup_locksetup_exit(rdtgrp);
+ if (ret)
+ goto out;
+ }
+ rdtgrp->mode = RDT_MODE_EXCLUSIVE;
+ } else if (!strcmp(buf, "pseudo-locksetup")) {
+ ret = rdtgroup_locksetup_enter(rdtgrp);
+ if (ret)
+ goto out;
+ rdtgrp->mode = RDT_MODE_PSEUDO_LOCKSETUP;
+ } else {
+ rdt_last_cmd_printf("unknown/unsupported mode\n");
+ ret = -EINVAL;
+ }
+
+out:
+ rdtgroup_kn_unlock(of->kn);
+ return ret ?: nbytes;
+}
+
+/**
+ * rdtgroup_cbm_to_size - Translate CBM to size in bytes
+ * @r: RDT resource to which @d belongs.
+ * @d: RDT domain instance.
+ * @cbm: bitmask for which the size should be computed.
+ *
+ * The bitmask provided associated with the RDT domain instance @d will be
+ * translated into how many bytes it represents. The size in bytes is
+ * computed by first dividing the total cache size by the CBM length to
+ * determine how many bytes each bit in the bitmask represents. The result
+ * is multiplied with the number of bits set in the bitmask.
+ */
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
+ struct rdt_domain *d, u32 cbm)
+{
+ struct cpu_cacheinfo *ci;
+ unsigned int size = 0;
+ int num_b, i;
+
+ num_b = bitmap_weight((unsigned long *)&cbm, r->cache.cbm_len);
+ ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
+ for (i = 0; i < ci->num_leaves; i++) {
+ if (ci->info_list[i].level == r->cache_level) {
+ size = ci->info_list[i].size / r->cache.cbm_len * num_b;
+ break;
+ }
+ }
+
+ return size;
+}
+
+/**
+ * rdtgroup_size_show - Display size in bytes of allocated regions
+ *
+ * The "size" file mirrors the layout of the "schemata" file, printing the
+ * size in bytes of each region instead of the capacity bitmask.
+ *
+ */
+static int rdtgroup_size_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdtgroup *rdtgrp;
+ struct rdt_resource *r;
+ struct rdt_domain *d;
+ unsigned int size;
+ bool sep = false;
+ u32 cbm;
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(of->kn);
+ return -ENOENT;
+ }
+
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+ seq_printf(s, "%*s:", max_name_width, rdtgrp->plr->r->name);
+ size = rdtgroup_cbm_to_size(rdtgrp->plr->r,
+ rdtgrp->plr->d,
+ rdtgrp->plr->cbm);
+ seq_printf(s, "%d=%u\n", rdtgrp->plr->d->id, size);
+ goto out;
+ }
+
+ for_each_alloc_enabled_rdt_resource(r) {
+ seq_printf(s, "%*s:", max_name_width, r->name);
+ list_for_each_entry(d, &r->domains, list) {
+ if (sep)
+ seq_putc(s, ';');
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+ size = 0;
+ } else {
+ cbm = d->ctrl_val[rdtgrp->closid];
+ size = rdtgroup_cbm_to_size(r, d, cbm);
+ }
+ seq_printf(s, "%d=%u", d->id, size);
+ sep = true;
+ }
+ seq_putc(s, '\n');
+ }
+
+out:
+ rdtgroup_kn_unlock(of->kn);
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
@@ -792,6 +1247,13 @@
.fflags = RF_CTRL_INFO | RFTYPE_RES_CACHE,
},
{
+ .name = "bit_usage",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdt_bit_usage_show,
+ .fflags = RF_CTRL_INFO | RFTYPE_RES_CACHE,
+ },
+ {
.name = "min_bandwidth",
.mode = 0444,
.kf_ops = &rdtgroup_kf_single_ops,
@@ -853,6 +1315,22 @@
.seq_show = rdtgroup_schemata_show,
.fflags = RF_CTRL_BASE,
},
+ {
+ .name = "mode",
+ .mode = 0644,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .write = rdtgroup_mode_write,
+ .seq_show = rdtgroup_mode_show,
+ .fflags = RF_CTRL_BASE,
+ },
+ {
+ .name = "size",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_size_show,
+ .fflags = RF_CTRL_BASE,
+ },
+
};
static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
@@ -883,6 +1361,103 @@
return ret;
}
+/**
+ * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
+ * @r: The resource group with which the file is associated.
+ * @name: Name of the file
+ *
+ * The permissions of named resctrl file, directory, or link are modified
+ * to not allow read, write, or execute by any user.
+ *
+ * WARNING: This function is intended to communicate to the user that the
+ * resctrl file has been locked down - that it is not relevant to the
+ * particular state the system finds itself in. It should not be relied
+ * on to protect from user access because after the file's permissions
+ * are restricted the user can still change the permissions using chmod
+ * from the command line.
+ *
+ * Return: 0 on success, <0 on failure.
+ */
+int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name)
+{
+ struct iattr iattr = {.ia_valid = ATTR_MODE,};
+ struct kernfs_node *kn;
+ int ret = 0;
+
+ kn = kernfs_find_and_get_ns(r->kn, name, NULL);
+ if (!kn)
+ return -ENOENT;
+
+ switch (kernfs_type(kn)) {
+ case KERNFS_DIR:
+ iattr.ia_mode = S_IFDIR;
+ break;
+ case KERNFS_FILE:
+ iattr.ia_mode = S_IFREG;
+ break;
+ case KERNFS_LINK:
+ iattr.ia_mode = S_IFLNK;
+ break;
+ }
+
+ ret = kernfs_setattr(kn, &iattr);
+ kernfs_put(kn);
+ return ret;
+}
+
+/**
+ * rdtgroup_kn_mode_restore - Restore user access to named resctrl file
+ * @r: The resource group with which the file is associated.
+ * @name: Name of the file
+ * @mask: Mask of permissions that should be restored
+ *
+ * Restore the permissions of the named file. If @name is a directory the
+ * permissions of its parent will be used.
+ *
+ * Return: 0 on success, <0 on failure.
+ */
+int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
+ umode_t mask)
+{
+ struct iattr iattr = {.ia_valid = ATTR_MODE,};
+ struct kernfs_node *kn, *parent;
+ struct rftype *rfts, *rft;
+ int ret, len;
+
+ rfts = res_common_files;
+ len = ARRAY_SIZE(res_common_files);
+
+ for (rft = rfts; rft < rfts + len; rft++) {
+ if (!strcmp(rft->name, name))
+ iattr.ia_mode = rft->mode & mask;
+ }
+
+ kn = kernfs_find_and_get_ns(r->kn, name, NULL);
+ if (!kn)
+ return -ENOENT;
+
+ switch (kernfs_type(kn)) {
+ case KERNFS_DIR:
+ parent = kernfs_get_parent(kn);
+ if (parent) {
+ iattr.ia_mode |= parent->mode;
+ kernfs_put(parent);
+ }
+ iattr.ia_mode |= S_IFDIR;
+ break;
+ case KERNFS_FILE:
+ iattr.ia_mode |= S_IFREG;
+ break;
+ case KERNFS_LINK:
+ iattr.ia_mode |= S_IFLNK;
+ break;
+ }
+
+ ret = kernfs_setattr(kn, &iattr);
+ kernfs_put(kn);
+ return ret;
+}
+
static int rdtgroup_mkdir_info_resdir(struct rdt_resource *r, char *name,
unsigned long fflags)
{
@@ -1224,6 +1799,9 @@
if (atomic_dec_and_test(&rdtgrp->waitcount) &&
(rdtgrp->flags & RDT_DELETED)) {
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+ rdtgroup_pseudo_lock_remove(rdtgrp);
kernfs_unbreak_active_protection(kn);
kernfs_put(rdtgrp->kn);
kfree(rdtgrp);
@@ -1289,10 +1867,16 @@
rdtgroup_default.mon.mon_data_kn = kn_mondata;
}
+ ret = rdt_pseudo_lock_init();
+ if (ret) {
+ dentry = ERR_PTR(ret);
+ goto out_mondata;
+ }
+
dentry = kernfs_mount(fs_type, flags, rdt_root,
RDTGROUP_SUPER_MAGIC, NULL);
if (IS_ERR(dentry))
- goto out_mondata;
+ goto out_psl;
if (rdt_alloc_capable)
static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
@@ -1310,6 +1894,8 @@
goto out;
+out_psl:
+ rdt_pseudo_lock_release();
out_mondata:
if (rdt_mon_capable)
kernfs_remove(kn_mondata);
@@ -1447,6 +2033,10 @@
if (rdtgrp == &rdtgroup_default)
continue;
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+ rdtgroup_pseudo_lock_remove(rdtgrp);
+
/*
* Give any CPUs back to the default group. We cannot copy
* cpu_online_mask because a CPU might have executed the
@@ -1483,6 +2073,8 @@
reset_all_ctrls(r);
cdp_disable_all();
rmdir_all_sub();
+ rdt_pseudo_lock_release();
+ rdtgroup_default.mode = RDT_MODE_SHAREABLE;
static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
static_branch_disable_cpuslocked(&rdt_mon_enable_key);
static_branch_disable_cpuslocked(&rdt_enable_key);
@@ -1682,6 +2274,114 @@
return ret;
}
+/**
+ * cbm_ensure_valid - Enforce validity on provided CBM
+ * @_val: Candidate CBM
+ * @r: RDT resource to which the CBM belongs
+ *
+ * The provided CBM represents all cache portions available for use. This
+ * may be represented by a bitmap that does not consist of contiguous ones
+ * and thus be an invalid CBM.
+ * Here the provided CBM is forced to be a valid CBM by only considering
+ * the first set of contiguous bits as valid and clearing all bits.
+ * The intention here is to provide a valid default CBM with which a new
+ * resource group is initialized. The user can follow this with a
+ * modification to the CBM if the default does not satisfy the
+ * requirements.
+ */
+static void cbm_ensure_valid(u32 *_val, struct rdt_resource *r)
+{
+ /*
+ * Convert the u32 _val to an unsigned long required by all the bit
+ * operations within this function. No more than 32 bits of this
+ * converted value can be accessed because all bit operations are
+ * additionally provided with cbm_len that is initialized during
+ * hardware enumeration using five bits from the EAX register and
+ * thus never can exceed 32 bits.
+ */
+ unsigned long *val = (unsigned long *)_val;
+ unsigned int cbm_len = r->cache.cbm_len;
+ unsigned long first_bit, zero_bit;
+
+ if (*val == 0)
+ return;
+
+ first_bit = find_first_bit(val, cbm_len);
+ zero_bit = find_next_zero_bit(val, cbm_len, first_bit);
+
+ /* Clear any remaining bits to ensure contiguous region */
+ bitmap_clear(val, zero_bit, cbm_len - zero_bit);
+}
+
+/**
+ * rdtgroup_init_alloc - Initialize the new RDT group's allocations
+ *
+ * A new RDT group is being created on an allocation capable (CAT)
+ * supporting system. Set this group up to start off with all usable
+ * allocations. That is, all shareable and unused bits.
+ *
+ * All-zero CBM is invalid. If there are no more shareable bits available
+ * on any domain then the entire allocation will fail.
+ */
+static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
+{
+ u32 used_b = 0, unused_b = 0;
+ u32 closid = rdtgrp->closid;
+ struct rdt_resource *r;
+ enum rdtgrp_mode mode;
+ struct rdt_domain *d;
+ int i, ret;
+ u32 *ctrl;
+
+ for_each_alloc_enabled_rdt_resource(r) {
+ list_for_each_entry(d, &r->domains, list) {
+ d->have_new_ctrl = false;
+ d->new_ctrl = r->cache.shareable_bits;
+ used_b = r->cache.shareable_bits;
+ ctrl = d->ctrl_val;
+ for (i = 0; i < r->num_closid; i++, ctrl++) {
+ if (closid_allocated(i) && i != closid) {
+ mode = rdtgroup_mode_by_closid(i);
+ if (mode == RDT_MODE_PSEUDO_LOCKSETUP)
+ break;
+ used_b |= *ctrl;
+ if (mode == RDT_MODE_SHAREABLE)
+ d->new_ctrl |= *ctrl;
+ }
+ }
+ if (d->plr && d->plr->cbm > 0)
+ used_b |= d->plr->cbm;
+ unused_b = used_b ^ (BIT_MASK(r->cache.cbm_len) - 1);
+ unused_b &= BIT_MASK(r->cache.cbm_len) - 1;
+ d->new_ctrl |= unused_b;
+ /*
+ * Force the initial CBM to be valid, user can
+ * modify the CBM based on system availability.
+ */
+ cbm_ensure_valid(&d->new_ctrl, r);
+ if (bitmap_weight((unsigned long *) &d->new_ctrl,
+ r->cache.cbm_len) <
+ r->cache.min_cbm_bits) {
+ rdt_last_cmd_printf("no space on %s:%d\n",
+ r->name, d->id);
+ return -ENOSPC;
+ }
+ d->have_new_ctrl = true;
+ }
+ }
+
+ for_each_alloc_enabled_rdt_resource(r) {
+ ret = update_domains(r, rdtgrp->closid);
+ if (ret < 0) {
+ rdt_last_cmd_puts("failed to initialize allocations\n");
+ return ret;
+ }
+ rdtgrp->mode = RDT_MODE_SHAREABLE;
+ }
+
+ return 0;
+}
+
static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
struct kernfs_node *prgrp_kn,
const char *name, umode_t mode,
@@ -1700,6 +2400,14 @@
goto out_unlock;
}
+ if (rtype == RDTMON_GROUP &&
+ (prdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+ prdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)) {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("pseudo-locking in progress\n");
+ goto out_unlock;
+ }
+
/* allocate the rdtgroup. */
rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
if (!rdtgrp) {
@@ -1840,6 +2548,10 @@
ret = 0;
rdtgrp->closid = closid;
+ ret = rdtgroup_init_alloc(rdtgrp);
+ if (ret < 0)
+ goto out_id_free;
+
list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
if (rdt_mon_capable) {
@@ -1850,15 +2562,16 @@
ret = mongroup_create_dir(kn, NULL, "mon_groups", NULL);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
- goto out_id_free;
+ goto out_del_list;
}
}
goto out_unlock;
+out_del_list:
+ list_del(&rdtgrp->rdtgroup_list);
out_id_free:
closid_free(closid);
- list_del(&rdtgrp->rdtgroup_list);
out_common_fail:
mkdir_rdt_prepare_clean(rdtgrp);
out_unlock:
@@ -1945,6 +2658,21 @@
return 0;
}
+static int rdtgroup_ctrl_remove(struct kernfs_node *kn,
+ struct rdtgroup *rdtgrp)
+{
+ rdtgrp->flags = RDT_DELETED;
+ list_del(&rdtgrp->rdtgroup_list);
+
+ /*
+ * one extra hold on this, will drop when we kfree(rdtgrp)
+ * in rdtgroup_kn_unlock()
+ */
+ kernfs_get(kn);
+ kernfs_remove(rdtgrp->kn);
+ return 0;
+}
+
static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
cpumask_var_t tmpmask)
{
@@ -1970,7 +2698,6 @@
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
update_closid_rmid(tmpmask, NULL);
- rdtgrp->flags = RDT_DELETED;
closid_free(rdtgrp->closid);
free_rmid(rdtgrp->mon.rmid);
@@ -1979,14 +2706,7 @@
*/
free_all_child_rdtgrp(rdtgrp);
- list_del(&rdtgrp->rdtgroup_list);
-
- /*
- * one extra hold on this, will drop when we kfree(rdtgrp)
- * in rdtgroup_kn_unlock()
- */
- kernfs_get(kn);
- kernfs_remove(rdtgrp->kn);
+ rdtgroup_ctrl_remove(kn, rdtgrp);
return 0;
}
@@ -2014,13 +2734,19 @@
* If the rdtgroup is a mon group and parent directory
* is a valid "mon_groups" directory, remove the mon group.
*/
- if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn)
- ret = rdtgroup_rmdir_ctrl(kn, rdtgrp, tmpmask);
- else if (rdtgrp->type == RDTMON_GROUP &&
- is_mon_groups(parent_kn, kn->name))
+ if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn) {
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+ ret = rdtgroup_ctrl_remove(kn, rdtgrp);
+ } else {
+ ret = rdtgroup_rmdir_ctrl(kn, rdtgrp, tmpmask);
+ }
+ } else if (rdtgrp->type == RDTMON_GROUP &&
+ is_mon_groups(parent_kn, kn->name)) {
ret = rdtgroup_rmdir_mon(kn, rdtgrp, tmpmask);
- else
+ } else {
ret = -EPERM;
+ }
out:
rdtgroup_kn_unlock(kn);
@@ -2046,7 +2772,8 @@
int ret;
rdt_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
- KERNFS_ROOT_CREATE_DEACTIVATED,
+ KERNFS_ROOT_CREATE_DEACTIVATED |
+ KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK,
&rdtgroup_default);
if (IS_ERR(rdt_root))
return PTR_ERR(rdt_root);
@@ -2102,6 +2829,29 @@
if (ret)
goto cleanup_mountpoint;
+ /*
+ * Adding the resctrl debugfs directory here may not be ideal since
+ * it would let the resctrl debugfs directory appear on the debugfs
+ * filesystem before the resctrl filesystem is mounted.
+ * It may also be ok since that would enable debugging of RDT before
+ * resctrl is mounted.
+ * The reason why the debugfs directory is created here and not in
+ * rdt_mount() is because rdt_mount() takes rdtgroup_mutex and
+ * during the debugfs directory creation also &sb->s_type->i_mutex_key
+ * (the lockdep class of inode->i_rwsem). Other filesystem
+ * interactions (eg. SyS_getdents) have the lock ordering:
+ * &sb->s_type->i_mutex_key --> &mm->mmap_sem
+ * During mmap(), called with &mm->mmap_sem, the rdtgroup_mutex
+ * is taken, thus creating dependency:
+ * &mm->mmap_sem --> rdtgroup_mutex for the latter that can cause
+ * issues considering the other two lock dependencies.
+ * By creating the debugfs directory here we avoid a dependency
+ * that may cause deadlock (even though file operations cannot
+ * occur until the filesystem is mounted, but I do not know how to
+ * tell lockdep that).
+ */
+ debugfs_resctrl = debugfs_create_dir("resctrl", NULL);
+
return 0;
cleanup_mountpoint:
@@ -2111,3 +2861,11 @@
return ret;
}
+
+void __exit rdtgroup_exit(void)
+{
+ debugfs_remove_recursive(debugfs_resctrl);
+ unregister_filesystem(&rdt_fs_type);
+ sysfs_remove_mount_point(fs_kobj, "resctrl");
+ kernfs_destroy_root(rdt_root);
+}
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8c50754..4b76728 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -123,8 +123,8 @@
{
memset(m, 0, sizeof(struct mce));
m->cpu = m->extcpu = smp_processor_id();
- /* We hope get_seconds stays lockless */
- m->time = get_seconds();
+ /* need the internal __ version to avoid deadlocks */
+ m->time = __ktime_get_real_seconds();
m->cpuvendor = boot_cpu_data.x86_vendor;
m->cpuid = cpuid_eax(1);
m->socketid = cpu_data(m->extcpu).phys_proc_id;
@@ -1104,6 +1104,101 @@
}
#endif
+
+/*
+ * Cases where we avoid rendezvous handler timeout:
+ * 1) If this CPU is offline.
+ *
+ * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to
+ * skip those CPUs which remain looping in the 1st kernel - see
+ * crash_nmi_callback().
+ *
+ * Note: there still is a small window between kexec-ing and the new,
+ * kdump kernel establishing a new #MC handler where a broadcasted MCE
+ * might not get handled properly.
+ */
+static bool __mc_check_crashing_cpu(int cpu)
+{
+ if (cpu_is_offline(cpu) ||
+ (crashing_cpu != -1 && crashing_cpu != cpu)) {
+ u64 mcgstatus;
+
+ mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
+ if (mcgstatus & MCG_STATUS_RIPV) {
+ mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
+ return true;
+ }
+ }
+ return false;
+}
+
+static void __mc_scan_banks(struct mce *m, struct mce *final,
+ unsigned long *toclear, unsigned long *valid_banks,
+ int no_way_out, int *worst)
+{
+ struct mca_config *cfg = &mca_cfg;
+ int severity, i;
+
+ for (i = 0; i < cfg->banks; i++) {
+ __clear_bit(i, toclear);
+ if (!test_bit(i, valid_banks))
+ continue;
+
+ if (!mce_banks[i].ctl)
+ continue;
+
+ m->misc = 0;
+ m->addr = 0;
+ m->bank = i;
+
+ m->status = mce_rdmsrl(msr_ops.status(i));
+ if (!(m->status & MCI_STATUS_VAL))
+ continue;
+
+ /*
+ * Corrected or non-signaled errors are handled by
+ * machine_check_poll(). Leave them alone, unless this panics.
+ */
+ if (!(m->status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
+ !no_way_out)
+ continue;
+
+ /* Set taint even when machine check was not enabled. */
+ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+
+ severity = mce_severity(m, cfg->tolerant, NULL, true);
+
+ /*
+ * When machine check was for corrected/deferred handler don't
+ * touch, unless we're panicking.
+ */
+ if ((severity == MCE_KEEP_SEVERITY ||
+ severity == MCE_UCNA_SEVERITY) && !no_way_out)
+ continue;
+
+ __set_bit(i, toclear);
+
+ /* Machine check event was not enabled. Clear, but ignore. */
+ if (severity == MCE_NO_SEVERITY)
+ continue;
+
+ mce_read_aux(m, i);
+
+ /* assuming valid severity level != 0 */
+ m->severity = severity;
+
+ mce_log(m);
+
+ if (severity > *worst) {
+ *final = *m;
+ *worst = severity;
+ }
+ }
+
+ /* mce_clear_state will clear *final, save locally for use later */
+ *m = *final;
+}
+
/*
* The actual machine check handler. This only handles real
* exceptions when something got corrupted coming in through int 18.
@@ -1118,68 +1213,45 @@
*/
void do_machine_check(struct pt_regs *regs, long error_code)
{
+ DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
+ DECLARE_BITMAP(toclear, MAX_NR_BANKS);
struct mca_config *cfg = &mca_cfg;
+ int cpu = smp_processor_id();
+ char *msg = "Unknown";
struct mce m, *final;
- int i;
int worst = 0;
- int severity;
/*
* Establish sequential order between the CPUs entering the machine
* check handler.
*/
int order = -1;
+
/*
* If no_way_out gets set, there is no safe way to recover from this
* MCE. If mca_cfg.tolerant is cranked up, we'll try anyway.
*/
int no_way_out = 0;
+
/*
* If kill_it gets set, there might be a way to recover from this
* error.
*/
int kill_it = 0;
- DECLARE_BITMAP(toclear, MAX_NR_BANKS);
- DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
- char *msg = "Unknown";
/*
* MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
* on Intel.
*/
int lmce = 1;
- int cpu = smp_processor_id();
- /*
- * Cases where we avoid rendezvous handler timeout:
- * 1) If this CPU is offline.
- *
- * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to
- * skip those CPUs which remain looping in the 1st kernel - see
- * crash_nmi_callback().
- *
- * Note: there still is a small window between kexec-ing and the new,
- * kdump kernel establishing a new #MC handler where a broadcasted MCE
- * might not get handled properly.
- */
- if (cpu_is_offline(cpu) ||
- (crashing_cpu != -1 && crashing_cpu != cpu)) {
- u64 mcgstatus;
-
- mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
- if (mcgstatus & MCG_STATUS_RIPV) {
- mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
- return;
- }
- }
+ if (__mc_check_crashing_cpu(cpu))
+ return;
ist_enter(regs);
this_cpu_inc(mce_exception_count);
- if (!cfg->banks)
- goto out;
-
mce_gather_info(&m, regs);
m.tsc = rdtsc();
@@ -1220,67 +1292,7 @@
order = mce_start(&no_way_out);
}
- for (i = 0; i < cfg->banks; i++) {
- __clear_bit(i, toclear);
- if (!test_bit(i, valid_banks))
- continue;
- if (!mce_banks[i].ctl)
- continue;
-
- m.misc = 0;
- m.addr = 0;
- m.bank = i;
-
- m.status = mce_rdmsrl(msr_ops.status(i));
- if ((m.status & MCI_STATUS_VAL) == 0)
- continue;
-
- /*
- * Non uncorrected or non signaled errors are handled by
- * machine_check_poll. Leave them alone, unless this panics.
- */
- if (!(m.status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
- !no_way_out)
- continue;
-
- /*
- * Set taint even when machine check was not enabled.
- */
- add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
-
- severity = mce_severity(&m, cfg->tolerant, NULL, true);
-
- /*
- * When machine check was for corrected/deferred handler don't
- * touch, unless we're panicing.
- */
- if ((severity == MCE_KEEP_SEVERITY ||
- severity == MCE_UCNA_SEVERITY) && !no_way_out)
- continue;
- __set_bit(i, toclear);
- if (severity == MCE_NO_SEVERITY) {
- /*
- * Machine check event was not enabled. Clear, but
- * ignore.
- */
- continue;
- }
-
- mce_read_aux(&m, i);
-
- /* assuming valid severity level != 0 */
- m.severity = severity;
-
- mce_log(&m);
-
- if (severity > worst) {
- *final = m;
- worst = severity;
- }
- }
-
- /* mce_clear_state will clear *final, save locally for use later */
- m = *final;
+ __mc_scan_banks(&m, final, toclear, valid_banks, no_way_out, &worst);
if (!no_way_out)
mce_clear_state(toclear);
@@ -1319,7 +1331,7 @@
if (worst > 0)
mce_report_event(regs);
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-out:
+
sync_core();
if (worst != MCE_AR_SEVERITY && !kill_it)
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 666a284..9c86529 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -22,8 +22,6 @@
#include <asm/stacktrace.h>
#include <asm/unwind.h>
-#define OPCODE_BUFSIZE 64
-
int panic_on_unrecovered_nmi;
int panic_on_io_nmi;
static int die_counter;
@@ -93,26 +91,18 @@
*/
void show_opcodes(u8 *rip, const char *loglvl)
{
- unsigned int code_prologue = OPCODE_BUFSIZE * 2 / 3;
+#define PROLOGUE_SIZE 42
+#define EPILOGUE_SIZE 21
+#define OPCODE_BUFSIZE (PROLOGUE_SIZE + 1 + EPILOGUE_SIZE)
u8 opcodes[OPCODE_BUFSIZE];
- u8 *ip;
- int i;
- printk("%sCode: ", loglvl);
-
- ip = (u8 *)rip - code_prologue;
- if (probe_kernel_read(opcodes, ip, OPCODE_BUFSIZE)) {
- pr_cont("Bad RIP value.\n");
- return;
+ if (probe_kernel_read(opcodes, rip - PROLOGUE_SIZE, OPCODE_BUFSIZE)) {
+ printk("%sCode: Bad RIP value.\n", loglvl);
+ } else {
+ printk("%sCode: %" __stringify(PROLOGUE_SIZE) "ph <%02x> %"
+ __stringify(EPILOGUE_SIZE) "ph\n", loglvl, opcodes,
+ opcodes[PROLOGUE_SIZE], opcodes + PROLOGUE_SIZE + 1);
}
-
- for (i = 0; i < OPCODE_BUFSIZE; i++, ip++) {
- if (ip == rip)
- pr_cont("<%02x> ", opcodes[i]);
- else
- pr_cont("%02x ", opcodes[i]);
- }
- pr_cont("\n");
}
void show_ip(struct pt_regs *regs, const char *loglvl)
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index abe6df1..30f9cb2 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -512,11 +512,18 @@
ENTRY(setup_once_ref)
.long setup_once
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#define PGD_ALIGN (2 * PAGE_SIZE)
+#define PTI_USER_PGD_FILL 1024
+#else
+#define PGD_ALIGN (PAGE_SIZE)
+#define PTI_USER_PGD_FILL 0
+#endif
/*
* BSS section
*/
__PAGE_ALIGNED_BSS
- .align PAGE_SIZE
+ .align PGD_ALIGN
#ifdef CONFIG_X86_PAE
.globl initial_pg_pmd
initial_pg_pmd:
@@ -526,14 +533,17 @@
initial_page_table:
.fill 1024,4,0
#endif
+ .align PGD_ALIGN
initial_pg_fixmap:
.fill 1024,4,0
+.globl swapper_pg_dir
+ .align PGD_ALIGN
+swapper_pg_dir:
+ .fill 1024,4,0
+ .fill PTI_USER_PGD_FILL,4,0
.globl empty_zero_page
empty_zero_page:
.fill 4096,1,0
-.globl swapper_pg_dir
-swapper_pg_dir:
- .fill 1024,4,0
EXPORT_SYMBOL(empty_zero_page)
/*
@@ -542,7 +552,7 @@
#ifdef CONFIG_X86_PAE
__PAGE_ALIGNED_DATA
/* Page-aligned for the benefit of paravirt? */
- .align PAGE_SIZE
+ .align PGD_ALIGN
ENTRY(initial_page_table)
.long pa(initial_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */
# if KPMDS == 3
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 8344dd2..15ebc2f 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -235,7 +235,7 @@
* address given in m16:64.
*/
pushq $.Lafter_lret # put return address on stack for unwinder
- xorq %rbp, %rbp # clear frame pointer
+ xorl %ebp, %ebp # clear frame pointer
movq initial_code(%rip), %rax
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 8771766..34a5c17 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -169,28 +169,29 @@
set_dr_addr_mask(0, i);
}
-/*
- * Check for virtual address in kernel space.
- */
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+static int arch_bp_generic_len(int x86_len)
{
- unsigned int len;
- unsigned long va;
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
- va = info->address;
- len = bp->attr.bp_len;
-
- /*
- * We don't need to worry about va + len - 1 overflowing:
- * we already require that va is aligned to a multiple of len.
- */
- return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
+ switch (x86_len) {
+ case X86_BREAKPOINT_LEN_1:
+ return HW_BREAKPOINT_LEN_1;
+ case X86_BREAKPOINT_LEN_2:
+ return HW_BREAKPOINT_LEN_2;
+ case X86_BREAKPOINT_LEN_4:
+ return HW_BREAKPOINT_LEN_4;
+#ifdef CONFIG_X86_64
+ case X86_BREAKPOINT_LEN_8:
+ return HW_BREAKPOINT_LEN_8;
+#endif
+ default:
+ return -EINVAL;
+ }
}
int arch_bp_generic_fields(int x86_len, int x86_type,
int *gen_len, int *gen_type)
{
+ int len;
+
/* Type */
switch (x86_type) {
case X86_BREAKPOINT_EXECUTE:
@@ -211,42 +212,47 @@
}
/* Len */
- switch (x86_len) {
- case X86_BREAKPOINT_LEN_1:
- *gen_len = HW_BREAKPOINT_LEN_1;
- break;
- case X86_BREAKPOINT_LEN_2:
- *gen_len = HW_BREAKPOINT_LEN_2;
- break;
- case X86_BREAKPOINT_LEN_4:
- *gen_len = HW_BREAKPOINT_LEN_4;
- break;
-#ifdef CONFIG_X86_64
- case X86_BREAKPOINT_LEN_8:
- *gen_len = HW_BREAKPOINT_LEN_8;
- break;
-#endif
- default:
+ len = arch_bp_generic_len(x86_len);
+ if (len < 0)
return -EINVAL;
- }
+ *gen_len = len;
return 0;
}
-
-static int arch_build_bp_info(struct perf_event *bp)
+/*
+ * Check for virtual address in kernel space.
+ */
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+ unsigned long va;
+ int len;
- info->address = bp->attr.bp_addr;
+ va = hw->address;
+ len = arch_bp_generic_len(hw->len);
+ WARN_ON_ONCE(len < 0);
+
+ /*
+ * We don't need to worry about va + len - 1 overflowing:
+ * we already require that va is aligned to a multiple of len.
+ */
+ return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
+}
+
+static int arch_build_bp_info(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
+{
+ hw->address = attr->bp_addr;
+ hw->mask = 0;
/* Type */
- switch (bp->attr.bp_type) {
+ switch (attr->bp_type) {
case HW_BREAKPOINT_W:
- info->type = X86_BREAKPOINT_WRITE;
+ hw->type = X86_BREAKPOINT_WRITE;
break;
case HW_BREAKPOINT_W | HW_BREAKPOINT_R:
- info->type = X86_BREAKPOINT_RW;
+ hw->type = X86_BREAKPOINT_RW;
break;
case HW_BREAKPOINT_X:
/*
@@ -254,23 +260,23 @@
* acceptable for kprobes. On non-kprobes kernels, we don't
* allow kernel breakpoints at all.
*/
- if (bp->attr.bp_addr >= TASK_SIZE_MAX) {
+ if (attr->bp_addr >= TASK_SIZE_MAX) {
#ifdef CONFIG_KPROBES
- if (within_kprobe_blacklist(bp->attr.bp_addr))
+ if (within_kprobe_blacklist(attr->bp_addr))
return -EINVAL;
#else
return -EINVAL;
#endif
}
- info->type = X86_BREAKPOINT_EXECUTE;
+ hw->type = X86_BREAKPOINT_EXECUTE;
/*
* x86 inst breakpoints need to have a specific undefined len.
* But we still need to check userspace is not trying to setup
* an unsupported length, to get a range breakpoint for example.
*/
- if (bp->attr.bp_len == sizeof(long)) {
- info->len = X86_BREAKPOINT_LEN_X;
+ if (attr->bp_len == sizeof(long)) {
+ hw->len = X86_BREAKPOINT_LEN_X;
return 0;
}
default:
@@ -278,28 +284,26 @@
}
/* Len */
- info->mask = 0;
-
- switch (bp->attr.bp_len) {
+ switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
- info->len = X86_BREAKPOINT_LEN_1;
+ hw->len = X86_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
- info->len = X86_BREAKPOINT_LEN_2;
+ hw->len = X86_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
- info->len = X86_BREAKPOINT_LEN_4;
+ hw->len = X86_BREAKPOINT_LEN_4;
break;
#ifdef CONFIG_X86_64
case HW_BREAKPOINT_LEN_8:
- info->len = X86_BREAKPOINT_LEN_8;
+ hw->len = X86_BREAKPOINT_LEN_8;
break;
#endif
default:
/* AMD range breakpoint */
- if (!is_power_of_2(bp->attr.bp_len))
+ if (!is_power_of_2(attr->bp_len))
return -EINVAL;
- if (bp->attr.bp_addr & (bp->attr.bp_len - 1))
+ if (attr->bp_addr & (attr->bp_len - 1))
return -EINVAL;
if (!boot_cpu_has(X86_FEATURE_BPEXT))
@@ -312,8 +316,8 @@
* breakpoints, then we'll have to check for kprobe-blacklisted
* addresses anywhere in the range.
*/
- info->mask = bp->attr.bp_len - 1;
- info->len = X86_BREAKPOINT_LEN_1;
+ hw->mask = attr->bp_len - 1;
+ hw->len = X86_BREAKPOINT_LEN_1;
}
return 0;
@@ -322,22 +326,23 @@
/*
* Validate the arch-specific HW Breakpoint register settings
*/
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
unsigned int align;
int ret;
- ret = arch_build_bp_info(bp);
+ ret = arch_build_bp_info(bp, attr, hw);
if (ret)
return ret;
- switch (info->len) {
+ switch (hw->len) {
case X86_BREAKPOINT_LEN_1:
align = 0;
- if (info->mask)
- align = info->mask;
+ if (hw->mask)
+ align = hw->mask;
break;
case X86_BREAKPOINT_LEN_2:
align = 1;
@@ -358,7 +363,7 @@
* Check that the low-order bits of the address are appropriate
* for the alignment implied by len.
*/
- if (info->address & align)
+ if (hw->address & align)
return -EINVAL;
return 0;
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index e56c95b..eeea935 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -37,15 +37,18 @@
BUG();
}
-static void __jump_label_transform(struct jump_entry *entry,
- enum jump_label_type type,
- void *(*poker)(void *, const void *, size_t),
- int init)
+static void __ref __jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type,
+ void *(*poker)(void *, const void *, size_t),
+ int init)
{
union jump_code_union code;
const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
+ if (early_boot_irqs_disabled)
+ poker = text_poke_early;
+
if (type == JUMP_LABEL_JMP) {
if (init) {
/*
diff --git a/arch/x86/kernel/kprobes/common.h b/arch/x86/kernel/kprobes/common.h
index ae38dcc..2b949f4 100644
--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -105,14 +105,4 @@
}
#endif
-#ifdef CONFIG_KPROBES_ON_FTRACE
-extern int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb);
-#else
-static inline int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- return 0;
-}
-#endif
#endif
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 6f4d423..b0d1e81 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -66,8 +66,6 @@
#include "common.h"
-void jprobe_return_end(void);
-
DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
@@ -395,8 +393,6 @@
- (u8 *) real;
if ((s64) (s32) newdisp != newdisp) {
pr_err("Kprobes error: new displacement does not fit into s32 (%llx)\n", newdisp);
- pr_err("\tSrc: %p, Dest: %p, old disp: %x\n",
- src, real, insn->displacement.value);
return 0;
}
disp = (u8 *) dest + insn_offset_displacement(insn);
@@ -596,7 +592,6 @@
* stepping.
*/
regs->ip = (unsigned long)p->ainsn.insn;
- preempt_enable_no_resched();
return;
}
#endif
@@ -640,8 +635,7 @@
* Raise a BUG or we'll continue in an endless reentering loop
* and eventually a stack overflow.
*/
- printk(KERN_WARNING "Unrecoverable kprobe detected at %p.\n",
- p->addr);
+ pr_err("Unrecoverable kprobe detected.\n");
dump_kprobe(p);
BUG();
default:
@@ -669,12 +663,10 @@
addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t));
/*
- * We don't want to be preempted for the entire
- * duration of kprobe processing. We conditionally
- * re-enable preemption at the end of this function,
- * and also in reenter_kprobe() and setup_singlestep().
+ * We don't want to be preempted for the entire duration of kprobe
+ * processing. Since int3 and debug trap disables irqs and we clear
+ * IF while singlestepping, it must be no preemptible.
*/
- preempt_disable();
kcb = get_kprobe_ctlblk();
p = get_kprobe(addr);
@@ -690,13 +682,14 @@
/*
* If we have no pre-handler or it returned 0, we
* continue with normal processing. If we have a
- * pre-handler and it returned non-zero, it prepped
- * for calling the break_handler below on re-entry
- * for jprobe processing, so get out doing nothing
- * more here.
+ * pre-handler and it returned non-zero, that means
+ * user handler setup registers to exit to another
+ * instruction, we must skip the single stepping.
*/
if (!p->pre_handler || !p->pre_handler(p, regs))
setup_singlestep(p, regs, kcb, 0);
+ else
+ reset_current_kprobe();
return 1;
}
} else if (*addr != BREAKPOINT_INSTRUCTION) {
@@ -710,18 +703,9 @@
* the original instruction.
*/
regs->ip = (unsigned long)addr;
- preempt_enable_no_resched();
return 1;
- } else if (kprobe_running()) {
- p = __this_cpu_read(current_kprobe);
- if (p->break_handler && p->break_handler(p, regs)) {
- if (!skip_singlestep(p, regs, kcb))
- setup_singlestep(p, regs, kcb, 0);
- return 1;
- }
} /* else: not a kprobe fault; let the kernel handle it */
- preempt_enable_no_resched();
return 0;
}
NOKPROBE_SYMBOL(kprobe_int3_handler);
@@ -972,8 +956,6 @@
}
reset_current_kprobe();
out:
- preempt_enable_no_resched();
-
/*
* if somebody else is singlestepping across a probe point, flags
* will have TF set, in which case, continue the remaining processing
@@ -1020,7 +1002,6 @@
restore_previous_kprobe(kcb);
else
reset_current_kprobe();
- preempt_enable_no_resched();
} else if (kcb->kprobe_status == KPROBE_HIT_ACTIVE ||
kcb->kprobe_status == KPROBE_HIT_SSDONE) {
/*
@@ -1083,93 +1064,6 @@
}
NOKPROBE_SYMBOL(kprobe_exceptions_notify);
-int setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- unsigned long addr;
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- kcb->jprobe_saved_regs = *regs;
- kcb->jprobe_saved_sp = stack_addr(regs);
- addr = (unsigned long)(kcb->jprobe_saved_sp);
-
- /*
- * As Linus pointed out, gcc assumes that the callee
- * owns the argument space and could overwrite it, e.g.
- * tailcall optimization. So, to be absolutely safe
- * we also save and restore enough stack bytes to cover
- * the argument area.
- * Use __memcpy() to avoid KASAN stack out-of-bounds reports as we copy
- * raw stack chunk with redzones:
- */
- __memcpy(kcb->jprobes_stack, (kprobe_opcode_t *)addr, MIN_STACK_SIZE(addr));
- regs->ip = (unsigned long)(jp->entry);
-
- /*
- * jprobes use jprobe_return() which skips the normal return
- * path of the function, and this messes up the accounting of the
- * function graph tracer to get messed up.
- *
- * Pause function graph tracing while performing the jprobe function.
- */
- pause_graph_tracing();
- return 1;
-}
-NOKPROBE_SYMBOL(setjmp_pre_handler);
-
-void jprobe_return(void)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
- /* Unpoison stack redzones in the frames we are going to jump over. */
- kasan_unpoison_stack_above_sp_to(kcb->jprobe_saved_sp);
-
- asm volatile (
-#ifdef CONFIG_X86_64
- " xchg %%rbx,%%rsp \n"
-#else
- " xchgl %%ebx,%%esp \n"
-#endif
- " int3 \n"
- " .globl jprobe_return_end\n"
- " jprobe_return_end: \n"
- " nop \n"::"b"
- (kcb->jprobe_saved_sp):"memory");
-}
-NOKPROBE_SYMBOL(jprobe_return);
-NOKPROBE_SYMBOL(jprobe_return_end);
-
-int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- u8 *addr = (u8 *) (regs->ip - 1);
- struct jprobe *jp = container_of(p, struct jprobe, kp);
- void *saved_sp = kcb->jprobe_saved_sp;
-
- if ((addr > (u8 *) jprobe_return) &&
- (addr < (u8 *) jprobe_return_end)) {
- if (stack_addr(regs) != saved_sp) {
- struct pt_regs *saved_regs = &kcb->jprobe_saved_regs;
- printk(KERN_ERR
- "current sp %p does not match saved sp %p\n",
- stack_addr(regs), saved_sp);
- printk(KERN_ERR "Saved registers for jprobe %p\n", jp);
- show_regs(saved_regs);
- printk(KERN_ERR "Current registers\n");
- show_regs(regs);
- BUG();
- }
- /* It's OK to start function graph tracing again */
- unpause_graph_tracing();
- *regs = kcb->jprobe_saved_regs;
- __memcpy(saved_sp, kcb->jprobes_stack, MIN_STACK_SIZE(saved_sp));
- preempt_enable_no_resched();
- return 1;
- }
- return 0;
-}
-NOKPROBE_SYMBOL(longjmp_break_handler);
-
bool arch_within_kprobe_blacklist(unsigned long addr)
{
bool is_in_entry_trampoline_section = false;
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 8dc0161..ef819e1 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,36 +25,6 @@
#include "common.h"
-static nokprobe_inline
-void __skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb, unsigned long orig_ip)
-{
- /*
- * Emulate singlestep (and also recover regs->ip)
- * as if there is a 5byte nop
- */
- regs->ip = (unsigned long)p->addr + MCOUNT_INSN_SIZE;
- if (unlikely(p->post_handler)) {
- kcb->kprobe_status = KPROBE_HIT_SSDONE;
- p->post_handler(p, regs, 0);
- }
- __this_cpu_write(current_kprobe, NULL);
- if (orig_ip)
- regs->ip = orig_ip;
-}
-
-int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
- struct kprobe_ctlblk *kcb)
-{
- if (kprobe_ftrace(p)) {
- __skip_singlestep(p, regs, kcb, 0);
- preempt_enable_no_resched();
- return 1;
- }
- return 0;
-}
-NOKPROBE_SYMBOL(skip_singlestep);
-
/* Ftrace callback handler for kprobes -- called under preepmt disabed */
void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct pt_regs *regs)
@@ -75,18 +45,25 @@
/* Kprobe handler expects regs->ip = ip + 1 as breakpoint hit */
regs->ip = ip + sizeof(kprobe_opcode_t);
- /* To emulate trap based kprobes, preempt_disable here */
- preempt_disable();
__this_cpu_write(current_kprobe, p);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
if (!p->pre_handler || !p->pre_handler(p, regs)) {
- __skip_singlestep(p, regs, kcb, orig_ip);
- preempt_enable_no_resched();
+ /*
+ * Emulate singlestep (and also recover regs->ip)
+ * as if there is a 5byte nop
+ */
+ regs->ip = (unsigned long)p->addr + MCOUNT_INSN_SIZE;
+ if (unlikely(p->post_handler)) {
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ p->post_handler(p, regs, 0);
+ }
+ regs->ip = orig_ip;
}
/*
- * If pre_handler returns !0, it sets regs->ip and
- * resets current kprobe, and keep preempt count +1.
+ * If pre_handler returns !0, it changes regs->ip. We have to
+ * skip emulating post_handler.
*/
+ __this_cpu_write(current_kprobe, NULL);
}
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 203d3988..eaf02f2 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -491,7 +491,6 @@
regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
if (!reenter)
reset_current_kprobe();
- preempt_enable_no_resched();
return 1;
}
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5b2300b..09aaabb 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -45,7 +45,6 @@
#include <asm/apic.h>
#include <asm/apicdef.h>
#include <asm/hypervisor.h>
-#include <asm/kvm_guest.h>
static int kvmapf = 1;
@@ -66,15 +65,6 @@
early_param("no-steal-acc", parse_no_stealacc);
-static int kvmclock_vsyscall = 1;
-static int __init parse_no_kvmclock_vsyscall(char *arg)
-{
- kvmclock_vsyscall = 0;
- return 0;
-}
-
-early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
-
static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
static DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64);
static int has_steal_clock = 0;
@@ -154,7 +144,7 @@
for (;;) {
if (!n.halted)
- prepare_to_swait(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+ prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
if (hlist_unhashed(&n.link))
break;
@@ -188,7 +178,7 @@
if (n->halted)
smp_send_reschedule(n->cpu);
else if (swq_has_sleeper(&n->wq))
- swake_up(&n->wq);
+ swake_up_one(&n->wq);
}
static void apf_task_wake_all(void)
@@ -560,9 +550,6 @@
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
apic_set_eoi_write(kvm_guest_apic_eoi_write);
- if (kvmclock_vsyscall)
- kvm_setup_vsyscall_timeinfo();
-
#ifdef CONFIG_SMP
smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
@@ -628,6 +615,7 @@
.name = "KVM",
.detect = kvm_detect,
.type = X86_HYPER_KVM,
+ .init.init_platform = kvmclock_init,
.init.guest_late_init = kvm_guest_init,
.init.x2apic_available = kvm_para_available,
};
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 3b8e7c1..d2edd7e 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -23,30 +23,56 @@
#include <asm/apic.h>
#include <linux/percpu.h>
#include <linux/hardirq.h>
-#include <linux/memblock.h>
+#include <linux/cpuhotplug.h>
#include <linux/sched.h>
#include <linux/sched/clock.h>
+#include <linux/mm.h>
+#include <asm/hypervisor.h>
#include <asm/mem_encrypt.h>
#include <asm/x86_init.h>
#include <asm/reboot.h>
#include <asm/kvmclock.h>
-static int kvmclock __ro_after_init = 1;
-static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
-static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
-static u64 kvm_sched_clock_offset;
+static int kvmclock __initdata = 1;
+static int kvmclock_vsyscall __initdata = 1;
+static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME;
+static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK;
+static u64 kvm_sched_clock_offset __ro_after_init;
-static int parse_no_kvmclock(char *arg)
+static int __init parse_no_kvmclock(char *arg)
{
kvmclock = 0;
return 0;
}
early_param("no-kvmclock", parse_no_kvmclock);
-/* The hypervisor will put information about time periodically here */
-static struct pvclock_vsyscall_time_info *hv_clock;
-static struct pvclock_wall_clock *wall_clock;
+static int __init parse_no_kvmclock_vsyscall(char *arg)
+{
+ kvmclock_vsyscall = 0;
+ return 0;
+}
+early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
+
+/* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
+#define HV_CLOCK_SIZE (sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)
+#define HVC_BOOT_ARRAY_SIZE \
+ (PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))
+
+static struct pvclock_vsyscall_time_info
+ hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __aligned(PAGE_SIZE);
+static struct pvclock_wall_clock wall_clock;
+static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
+
+static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
+{
+ return &this_cpu_read(hv_clock_per_cpu)->pvti;
+}
+
+static inline struct pvclock_vsyscall_time_info *this_cpu_hvclock(void)
+{
+ return this_cpu_read(hv_clock_per_cpu);
+}
/*
* The wallclock is the time of day when we booted. Since then, some time may
@@ -55,21 +81,10 @@
*/
static void kvm_get_wallclock(struct timespec64 *now)
{
- struct pvclock_vcpu_time_info *vcpu_time;
- int low, high;
- int cpu;
-
- low = (int)slow_virt_to_phys(wall_clock);
- high = ((u64)slow_virt_to_phys(wall_clock) >> 32);
-
- native_write_msr(msr_kvm_wall_clock, low, high);
-
- cpu = get_cpu();
-
- vcpu_time = &hv_clock[cpu].pvti;
- pvclock_read_wallclock(wall_clock, vcpu_time, now);
-
- put_cpu();
+ wrmsrl(msr_kvm_wall_clock, slow_virt_to_phys(&wall_clock));
+ preempt_disable();
+ pvclock_read_wallclock(&wall_clock, this_cpu_pvti(), now);
+ preempt_enable();
}
static int kvm_set_wallclock(const struct timespec64 *now)
@@ -79,14 +94,10 @@
static u64 kvm_clock_read(void)
{
- struct pvclock_vcpu_time_info *src;
u64 ret;
- int cpu;
preempt_disable_notrace();
- cpu = smp_processor_id();
- src = &hv_clock[cpu].pvti;
- ret = pvclock_clocksource_read(src);
+ ret = pvclock_clocksource_read(this_cpu_pvti());
preempt_enable_notrace();
return ret;
}
@@ -112,11 +123,11 @@
kvm_sched_clock_offset = kvm_clock_read();
pv_time_ops.sched_clock = kvm_sched_clock_read;
- printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
- kvm_sched_clock_offset);
+ pr_info("kvm-clock: using sched offset of %llu cycles",
+ kvm_sched_clock_offset);
BUILD_BUG_ON(sizeof(kvm_sched_clock_offset) >
- sizeof(((struct pvclock_vcpu_time_info *)NULL)->system_time));
+ sizeof(((struct pvclock_vcpu_time_info *)NULL)->system_time));
}
/*
@@ -130,19 +141,11 @@
*/
static unsigned long kvm_get_tsc_khz(void)
{
- struct pvclock_vcpu_time_info *src;
- int cpu;
- unsigned long tsc_khz;
-
- cpu = get_cpu();
- src = &hv_clock[cpu].pvti;
- tsc_khz = pvclock_tsc_khz(src);
- put_cpu();
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
- return tsc_khz;
+ return pvclock_tsc_khz(this_cpu_pvti());
}
-static void kvm_get_preset_lpj(void)
+static void __init kvm_get_preset_lpj(void)
{
unsigned long khz;
u64 lpj;
@@ -156,49 +159,40 @@
bool kvm_check_and_clear_guest_paused(void)
{
+ struct pvclock_vsyscall_time_info *src = this_cpu_hvclock();
bool ret = false;
- struct pvclock_vcpu_time_info *src;
- int cpu = smp_processor_id();
- if (!hv_clock)
+ if (!src)
return ret;
- src = &hv_clock[cpu].pvti;
- if ((src->flags & PVCLOCK_GUEST_STOPPED) != 0) {
- src->flags &= ~PVCLOCK_GUEST_STOPPED;
+ if ((src->pvti.flags & PVCLOCK_GUEST_STOPPED) != 0) {
+ src->pvti.flags &= ~PVCLOCK_GUEST_STOPPED;
pvclock_touch_watchdogs();
ret = true;
}
-
return ret;
}
struct clocksource kvm_clock = {
- .name = "kvm-clock",
- .read = kvm_clock_get_cycles,
- .rating = 400,
- .mask = CLOCKSOURCE_MASK(64),
- .flags = CLOCK_SOURCE_IS_CONTINUOUS,
+ .name = "kvm-clock",
+ .read = kvm_clock_get_cycles,
+ .rating = 400,
+ .mask = CLOCKSOURCE_MASK(64),
+ .flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
EXPORT_SYMBOL_GPL(kvm_clock);
-int kvm_register_clock(char *txt)
+static void kvm_register_clock(char *txt)
{
- int cpu = smp_processor_id();
- int low, high, ret;
- struct pvclock_vcpu_time_info *src;
+ struct pvclock_vsyscall_time_info *src = this_cpu_hvclock();
+ u64 pa;
- if (!hv_clock)
- return 0;
+ if (!src)
+ return;
- src = &hv_clock[cpu].pvti;
- low = (int)slow_virt_to_phys(src) | 1;
- high = ((u64)slow_virt_to_phys(src) >> 32);
- ret = native_write_msr_safe(msr_kvm_system_time, low, high);
- printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
- cpu, high, low, txt);
-
- return ret;
+ pa = slow_virt_to_phys(&src->pvti) | 0x01ULL;
+ wrmsrl(msr_kvm_system_time, pa);
+ pr_info("kvm-clock: cpu %d, msr %llx, %s", smp_processor_id(), pa, txt);
}
static void kvm_save_sched_clock_state(void)
@@ -213,11 +207,7 @@
#ifdef CONFIG_X86_LOCAL_APIC
static void kvm_setup_secondary_clock(void)
{
- /*
- * Now that the first cpu already had this clocksource initialized,
- * we shouldn't fail.
- */
- WARN_ON(kvm_register_clock("secondary cpu clock"));
+ kvm_register_clock("secondary cpu clock");
}
#endif
@@ -245,100 +235,84 @@
native_machine_shutdown();
}
-static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
- phys_addr_t align)
+static int __init kvm_setup_vsyscall_timeinfo(void)
{
- phys_addr_t mem;
+#ifdef CONFIG_X86_64
+ u8 flags;
- mem = memblock_alloc(size, align);
- if (!mem)
+ if (!per_cpu(hv_clock_per_cpu, 0) || !kvmclock_vsyscall)
return 0;
- if (sev_active()) {
- if (early_set_memory_decrypted((unsigned long)__va(mem), size))
- goto e_free;
- }
+ flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
+ if (!(flags & PVCLOCK_TSC_STABLE_BIT))
+ return 0;
- return mem;
-e_free:
- memblock_free(mem, size);
+ kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
+#endif
return 0;
}
+early_initcall(kvm_setup_vsyscall_timeinfo);
-static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
+static int kvmclock_setup_percpu(unsigned int cpu)
{
- if (sev_active())
- early_set_memory_encrypted((unsigned long)__va(addr), size);
+ struct pvclock_vsyscall_time_info *p = per_cpu(hv_clock_per_cpu, cpu);
- memblock_free(addr, size);
+ /*
+ * The per cpu area setup replicates CPU0 data to all cpu
+ * pointers. So carefully check. CPU0 has been set up in init
+ * already.
+ */
+ if (!cpu || (p && p != per_cpu(hv_clock_per_cpu, 0)))
+ return 0;
+
+ /* Use the static page for the first CPUs, allocate otherwise */
+ if (cpu < HVC_BOOT_ARRAY_SIZE)
+ p = &hv_clock_boot[cpu];
+ else
+ p = kzalloc(sizeof(*p), GFP_KERNEL);
+
+ per_cpu(hv_clock_per_cpu, cpu) = p;
+ return p ? 0 : -ENOMEM;
}
void __init kvmclock_init(void)
{
- struct pvclock_vcpu_time_info *vcpu_time;
- unsigned long mem, mem_wall_clock;
- int size, cpu, wall_clock_size;
u8 flags;
- size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
-
- if (!kvm_para_available())
+ if (!kvm_para_available() || !kvmclock)
return;
- if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
+ if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
- } else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
- return;
-
- wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
- mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
- if (!mem_wall_clock)
- return;
-
- wall_clock = __va(mem_wall_clock);
- memset(wall_clock, 0, wall_clock_size);
-
- mem = kvm_memblock_alloc(size, PAGE_SIZE);
- if (!mem) {
- kvm_memblock_free(mem_wall_clock, wall_clock_size);
- wall_clock = NULL;
+ } else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
return;
}
- hv_clock = __va(mem);
- memset(hv_clock, 0, size);
-
- if (kvm_register_clock("primary cpu clock")) {
- hv_clock = NULL;
- kvm_memblock_free(mem, size);
- kvm_memblock_free(mem_wall_clock, wall_clock_size);
- wall_clock = NULL;
+ if (cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "kvmclock:setup_percpu",
+ kvmclock_setup_percpu, NULL) < 0) {
return;
}
- printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
+ pr_info("kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);
- pvclock_set_pvti_cpu0_va(hv_clock);
+ this_cpu_write(hv_clock_per_cpu, &hv_clock_boot[0]);
+ kvm_register_clock("primary cpu clock");
+ pvclock_set_pvti_cpu0_va(hv_clock_boot);
if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
- cpu = get_cpu();
- vcpu_time = &hv_clock[cpu].pvti;
- flags = pvclock_read_flags(vcpu_time);
-
+ flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
- put_cpu();
x86_platform.calibrate_tsc = kvm_get_tsc_khz;
x86_platform.calibrate_cpu = kvm_get_tsc_khz;
x86_platform.get_wallclock = kvm_get_wallclock;
x86_platform.set_wallclock = kvm_set_wallclock;
#ifdef CONFIG_X86_LOCAL_APIC
- x86_cpuinit.early_percpu_clock_init =
- kvm_setup_secondary_clock;
+ x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock;
#endif
x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state;
@@ -350,31 +324,3 @@
clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
pv_info.name = "KVM";
}
-
-int __init kvm_setup_vsyscall_timeinfo(void)
-{
-#ifdef CONFIG_X86_64
- int cpu;
- u8 flags;
- struct pvclock_vcpu_time_info *vcpu_time;
- unsigned int size;
-
- if (!hv_clock)
- return 0;
-
- size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
-
- cpu = get_cpu();
-
- vcpu_time = &hv_clock[cpu].pvti;
- flags = pvclock_read_flags(vcpu_time);
-
- put_cpu();
-
- if (!(flags & PVCLOCK_TSC_STABLE_BIT))
- return 1;
-
- kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
-#endif
- return 0;
-}
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c9b1402..733e6ac 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -100,6 +100,102 @@
return new_ldt;
}
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+static void do_sanity_check(struct mm_struct *mm,
+ bool had_kernel_mapping,
+ bool had_user_mapping)
+{
+ if (mm->context.ldt) {
+ /*
+ * We already had an LDT. The top-level entry should already
+ * have been allocated and synchronized with the usermode
+ * tables.
+ */
+ WARN_ON(!had_kernel_mapping);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ WARN_ON(!had_user_mapping);
+ } else {
+ /*
+ * This is the first time we're mapping an LDT for this process.
+ * Sync the pgd to the usermode tables.
+ */
+ WARN_ON(had_kernel_mapping);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ WARN_ON(had_user_mapping);
+ }
+}
+
+#ifdef CONFIG_X86_PAE
+
+static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va)
+{
+ p4d_t *p4d;
+ pud_t *pud;
+
+ if (pgd->pgd == 0)
+ return NULL;
+
+ p4d = p4d_offset(pgd, va);
+ if (p4d_none(*p4d))
+ return NULL;
+
+ pud = pud_offset(p4d, va);
+ if (pud_none(*pud))
+ return NULL;
+
+ return pmd_offset(pud, va);
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+ pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+ pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+ pmd_t *k_pmd, *u_pmd;
+
+ k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+ u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+
+ if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+ set_pmd(u_pmd, *k_pmd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+ pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+ pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+ bool had_kernel, had_user;
+ pmd_t *k_pmd, *u_pmd;
+
+ k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+ u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+ had_kernel = (k_pmd->pmd != 0);
+ had_user = (u_pmd->pmd != 0);
+
+ do_sanity_check(mm, had_kernel, had_user);
+}
+
+#else /* !CONFIG_X86_PAE */
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+ pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+
+ if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+ set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+ pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+ bool had_kernel = (pgd->pgd != 0);
+ bool had_user = (kernel_to_user_pgdp(pgd)->pgd != 0);
+
+ do_sanity_check(mm, had_kernel, had_user);
+}
+
+#endif /* CONFIG_X86_PAE */
+
/*
* If PTI is enabled, this maps the LDT into the kernelmode and
* usermode tables for the given mm.
@@ -115,9 +211,8 @@
static int
map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
{
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
- bool is_vmalloc, had_top_level_entry;
unsigned long va;
+ bool is_vmalloc;
spinlock_t *ptl;
pgd_t *pgd;
int i;
@@ -131,13 +226,15 @@
*/
WARN_ON(ldt->slot != -1);
+ /* Check if the current mappings are sane */
+ sanity_check_ldt_mapping(mm);
+
/*
* Did we already have the top level entry allocated? We can't
* use pgd_none() for this because it doens't do anything on
* 4-level page table kernels.
*/
pgd = pgd_offset(mm, LDT_BASE_ADDR);
- had_top_level_entry = (pgd->pgd != 0);
is_vmalloc = is_vmalloc_addr(ldt->entries);
@@ -172,41 +269,31 @@
pte_unmap_unlock(ptep, ptl);
}
- if (mm->context.ldt) {
- /*
- * We already had an LDT. The top-level entry should already
- * have been allocated and synchronized with the usermode
- * tables.
- */
- WARN_ON(!had_top_level_entry);
- if (static_cpu_has(X86_FEATURE_PTI))
- WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
- } else {
- /*
- * This is the first time we're mapping an LDT for this process.
- * Sync the pgd to the usermode tables.
- */
- WARN_ON(had_top_level_entry);
- if (static_cpu_has(X86_FEATURE_PTI)) {
- WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
- set_pgd(kernel_to_user_pgdp(pgd), *pgd);
- }
- }
+ /* Propagate LDT mapping to the user page-table */
+ map_ldt_struct_to_user(mm);
va = (unsigned long)ldt_slot_va(slot);
flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
ldt->slot = slot;
-#endif
return 0;
}
+#else /* !CONFIG_PAGE_TABLE_ISOLATION */
+
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+ return 0;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
static void free_ldt_pgtables(struct mm_struct *mm)
{
#ifdef CONFIG_PAGE_TABLE_ISOLATION
struct mmu_gather tlb;
unsigned long start = LDT_BASE_ADDR;
- unsigned long end = start + (1UL << PGDIR_SHIFT);
+ unsigned long end = LDT_END_ADDR;
if (!static_cpu_has(X86_FEATURE_PTI))
return;
diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
index d1ab07e..5409c28 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -56,7 +56,7 @@
static void machine_kexec_free_page_tables(struct kimage *image)
{
- free_page((unsigned long)image->arch.pgd);
+ free_pages((unsigned long)image->arch.pgd, PGD_ALLOCATION_ORDER);
image->arch.pgd = NULL;
#ifdef CONFIG_X86_PAE
free_page((unsigned long)image->arch.pmd0);
@@ -72,7 +72,8 @@
static int machine_kexec_alloc_page_tables(struct kimage *image)
{
- image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+ image->arch.pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ PGD_ALLOCATION_ORDER);
#ifdef CONFIG_X86_PAE
image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 99dc79e..930c883 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -88,10 +88,12 @@
struct branch *b = insnbuf;
unsigned long delta = (unsigned long)target - (addr+5);
- if (tgt_clobbers & ~site_clobbers)
- return len; /* target would clobber too much for this site */
- if (len < 5)
+ if (len < 5) {
+#ifdef CONFIG_RETPOLINE
+ WARN_ONCE("Failing to patch indirect CALL in %ps\n", (void *)addr);
+#endif
return len; /* call too long for patch site */
+ }
b->opcode = 0xe8; /* call */
b->delta = delta;
@@ -106,8 +108,12 @@
struct branch *b = insnbuf;
unsigned long delta = (unsigned long)target - (addr+5);
- if (len < 5)
+ if (len < 5) {
+#ifdef CONFIG_RETPOLINE
+ WARN_ONCE("Failing to patch indirect JMP in %ps\n", (void *)addr);
+#endif
return len; /* call too long for patch site */
+ }
b->opcode = 0xe9; /* jmp */
b->delta = delta;
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 9edadab..9cb98f7 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -20,7 +20,7 @@
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%rdi)");
-DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "xor %rax, %rax");
+DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "xor %eax, %eax");
#endif
unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
diff --git a/arch/x86/kernel/pci-iommu_table.c b/arch/x86/kernel/pci-iommu_table.c
index 4dfd90a..2e9006c 100644
--- a/arch/x86/kernel/pci-iommu_table.c
+++ b/arch/x86/kernel/pci-iommu_table.c
@@ -60,7 +60,7 @@
printk(KERN_ERR "CYCLIC DEPENDENCY FOUND! %pS depends on %pS and vice-versa. BREAKING IT.\n",
p->detect, q->detect);
/* Heavy handed way..*/
- x->depend = 0;
+ x->depend = NULL;
}
}
diff --git a/arch/x86/kernel/pcspeaker.c b/arch/x86/kernel/pcspeaker.c
index da5190a..4a710ff 100644
--- a/arch/x86/kernel/pcspeaker.c
+++ b/arch/x86/kernel/pcspeaker.c
@@ -9,6 +9,6 @@
pd = platform_device_register_simple("pcspkr", -1, NULL, 0);
- return IS_ERR(pd) ? PTR_ERR(pd) : 0;
+ return PTR_ERR_OR_ZERO(pd);
}
device_initcall(add_pcspkr);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 30ca2d1..c93fcfd 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,14 +57,12 @@
*/
.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
-#ifdef CONFIG_X86_64
/*
* .sp1 is cpu_current_top_of_stack. The init task never
* runs user code, but cpu_current_top_of_stack should still
* be well defined before the first context switch.
*/
.sp1 = TOP_OF_INIT_STACK,
-#endif
#ifdef CONFIG_X86_32
.ss0 = __KERNEL_DS,
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0ae659d..2924fd4 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -285,7 +285,7 @@
* current_thread_info(). Refresh the SYSENTER configuration in
* case prev or next is vm86.
*/
- update_sp0(next_p);
+ update_task_stack(next_p);
refresh_sysenter_cs(next);
this_cpu_write(cpu_current_top_of_stack,
(unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445..476e3dd 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -478,7 +478,7 @@
this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
/* Reload sp0. */
- update_sp0(next_p);
+ update_task_stack(next_p);
/*
* Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2f86d88..5d32c55 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -866,6 +866,8 @@
idt_setup_early_traps();
early_cpu_init();
+ arch_init_ideal_nops();
+ jump_label_init();
early_ioremap_init();
setup_olpc_ofw_pgd();
@@ -1012,6 +1014,7 @@
*/
init_hypervisor_platform();
+ tsc_early_init();
x86_init.resources.probe_roms();
/* after parse_early_param, so could debug it */
@@ -1197,11 +1200,6 @@
memblock_find_dma_reserve();
-#ifdef CONFIG_KVM_GUEST
- kvmclock_init();
-#endif
-
- tsc_early_delay_calibrate();
if (!early_xdbc_setup_hardware())
early_xdbc_register_console();
@@ -1272,8 +1270,6 @@
mcheck_init();
- arch_init_ideal_nops();
-
register_refined_jiffies(CLOCK_TICK_RATE);
#ifdef CONFIG_EFI
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 093f2ea..7627455 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -81,16 +81,6 @@
#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
-#define STACKTRACE_DUMP_ONCE(task) ({ \
- static bool __section(.data.unlikely) __dumped; \
- \
- if (!__dumped) { \
- __dumped = true; \
- WARN_ON(1); \
- show_stack(task, NULL); \
- } \
-})
-
static int __always_inline
__save_stack_trace_reliable(struct stack_trace *trace,
struct task_struct *task)
@@ -99,30 +89,25 @@
struct pt_regs *regs;
unsigned long addr;
- for (unwind_start(&state, task, NULL, NULL); !unwind_done(&state);
+ for (unwind_start(&state, task, NULL, NULL);
+ !unwind_done(&state) && !unwind_error(&state);
unwind_next_frame(&state)) {
regs = unwind_get_entry_regs(&state, NULL);
if (regs) {
+ /* Success path for user tasks */
+ if (user_mode(regs))
+ goto success;
+
/*
* Kernel mode registers on the stack indicate an
* in-kernel interrupt or exception (e.g., preemption
* or a page fault), which can make frame pointers
* unreliable.
*/
- if (!user_mode(regs))
- return -EINVAL;
- /*
- * The last frame contains the user mode syscall
- * pt_regs. Skip it and finish the unwind.
- */
- unwind_next_frame(&state);
- if (!unwind_done(&state)) {
- STACKTRACE_DUMP_ONCE(task);
+ if (IS_ENABLED(CONFIG_FRAME_POINTER))
return -EINVAL;
- }
- break;
}
addr = unwind_get_return_address(&state);
@@ -132,21 +117,22 @@
* generated code which __kernel_text_address() doesn't know
* about.
*/
- if (!addr) {
- STACKTRACE_DUMP_ONCE(task);
+ if (!addr)
return -EINVAL;
- }
if (save_stack_address(trace, addr, false))
return -EINVAL;
}
/* Check for stack corruption */
- if (unwind_error(&state)) {
- STACKTRACE_DUMP_ONCE(task);
+ if (unwind_error(&state))
return -EINVAL;
- }
+ /* Success path for non-user tasks, i.e. kthreads and idle tasks */
+ if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
+ return -EINVAL;
+
+success:
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 74392d9..1463468 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -33,16 +33,13 @@
unsigned int __read_mostly tsc_khz;
EXPORT_SYMBOL(tsc_khz);
+#define KHZ 1000
+
/*
* TSC can be unstable due to cpufreq or due to unsynced TSCs
*/
static int __read_mostly tsc_unstable;
-/* native_sched_clock() is called before tsc_init(), so
- we must start with the TSC soft disabled to prevent
- erroneous rdtsc usage on !boot_cpu_has(X86_FEATURE_TSC) processors */
-static int __read_mostly tsc_disabled = -1;
-
static DEFINE_STATIC_KEY_FALSE(__use_tsc);
int tsc_clocksource_reliable;
@@ -106,23 +103,6 @@
* -johnstul@us.ibm.com "math is hard, lets go shopping!"
*/
-static void cyc2ns_data_init(struct cyc2ns_data *data)
-{
- data->cyc2ns_mul = 0;
- data->cyc2ns_shift = 0;
- data->cyc2ns_offset = 0;
-}
-
-static void __init cyc2ns_init(int cpu)
-{
- struct cyc2ns *c2n = &per_cpu(cyc2ns, cpu);
-
- cyc2ns_data_init(&c2n->data[0]);
- cyc2ns_data_init(&c2n->data[1]);
-
- seqcount_init(&c2n->seq);
-}
-
static inline unsigned long long cycles_2_ns(unsigned long long cyc)
{
struct cyc2ns_data data;
@@ -138,18 +118,11 @@
return ns;
}
-static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_now)
+static void __set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_now)
{
unsigned long long ns_now;
struct cyc2ns_data data;
struct cyc2ns *c2n;
- unsigned long flags;
-
- local_irq_save(flags);
- sched_clock_idle_sleep_event();
-
- if (!khz)
- goto done;
ns_now = cycles_2_ns(tsc_now);
@@ -181,13 +154,56 @@
c2n->data[0] = data;
raw_write_seqcount_latch(&c2n->seq);
c2n->data[1] = data;
+}
-done:
+static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_now)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ sched_clock_idle_sleep_event();
+
+ if (khz)
+ __set_cyc2ns_scale(khz, cpu, tsc_now);
+
sched_clock_idle_wakeup_event();
local_irq_restore(flags);
}
/*
+ * Initialize cyc2ns for boot cpu
+ */
+static void __init cyc2ns_init_boot_cpu(void)
+{
+ struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
+
+ seqcount_init(&c2n->seq);
+ __set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc());
+}
+
+/*
+ * Secondary CPUs do not run through tsc_init(), so set up
+ * all the scale factors for all CPUs, assuming the same
+ * speed as the bootup CPU. (cpufreq notifiers will fix this
+ * up if their speed diverges)
+ */
+static void __init cyc2ns_init_secondary_cpus(void)
+{
+ unsigned int cpu, this_cpu = smp_processor_id();
+ struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
+ struct cyc2ns_data *data = c2n->data;
+
+ for_each_possible_cpu(cpu) {
+ if (cpu != this_cpu) {
+ seqcount_init(&c2n->seq);
+ c2n = per_cpu_ptr(&cyc2ns, cpu);
+ c2n->data[0] = data[0];
+ c2n->data[1] = data[1];
+ }
+ }
+}
+
+/*
* Scheduler clock - returns current time in nanosec units.
*/
u64 native_sched_clock(void)
@@ -248,8 +264,7 @@
#ifdef CONFIG_X86_TSC
int __init notsc_setup(char *str)
{
- pr_warn("Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely\n");
- tsc_disabled = 1;
+ mark_tsc_unstable("boot parameter notsc");
return 1;
}
#else
@@ -665,30 +680,17 @@
return eax_base_mhz * 1000;
}
-/**
- * native_calibrate_cpu - calibrate the cpu on boot
+/*
+ * calibrate cpu using pit, hpet, and ptimer methods. They are available
+ * later in boot after acpi is initialized.
*/
-unsigned long native_calibrate_cpu(void)
+static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
{
u64 tsc1, tsc2, delta, ref1, ref2;
unsigned long tsc_pit_min = ULONG_MAX, tsc_ref_min = ULONG_MAX;
- unsigned long flags, latch, ms, fast_calibrate;
+ unsigned long flags, latch, ms;
int hpet = is_hpet_enabled(), i, loopmin;
- fast_calibrate = cpu_khz_from_cpuid();
- if (fast_calibrate)
- return fast_calibrate;
-
- fast_calibrate = cpu_khz_from_msr();
- if (fast_calibrate)
- return fast_calibrate;
-
- local_irq_save(flags);
- fast_calibrate = quick_pit_calibrate();
- local_irq_restore(flags);
- if (fast_calibrate)
- return fast_calibrate;
-
/*
* Run 5 calibration loops to get the lowest frequency value
* (the best estimate). We use two different calibration modes
@@ -831,6 +833,37 @@
return tsc_pit_min;
}
+/**
+ * native_calibrate_cpu_early - can calibrate the cpu early in boot
+ */
+unsigned long native_calibrate_cpu_early(void)
+{
+ unsigned long flags, fast_calibrate = cpu_khz_from_cpuid();
+
+ if (!fast_calibrate)
+ fast_calibrate = cpu_khz_from_msr();
+ if (!fast_calibrate) {
+ local_irq_save(flags);
+ fast_calibrate = quick_pit_calibrate();
+ local_irq_restore(flags);
+ }
+ return fast_calibrate;
+}
+
+
+/**
+ * native_calibrate_cpu - calibrate the cpu
+ */
+static unsigned long native_calibrate_cpu(void)
+{
+ unsigned long tsc_freq = native_calibrate_cpu_early();
+
+ if (!tsc_freq)
+ tsc_freq = pit_hpet_ptimer_calibrate_cpu();
+
+ return tsc_freq;
+}
+
void recalibrate_cpu_khz(void)
{
#ifndef CONFIG_SMP
@@ -1307,7 +1340,7 @@
static int __init init_tsc_clocksource(void)
{
- if (!boot_cpu_has(X86_FEATURE_TSC) || tsc_disabled > 0 || !tsc_khz)
+ if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
return 0;
if (tsc_unstable)
@@ -1341,40 +1374,22 @@
*/
device_initcall(init_tsc_clocksource);
-void __init tsc_early_delay_calibrate(void)
+static bool __init determine_cpu_tsc_frequencies(bool early)
{
- unsigned long lpj;
+ /* Make sure that cpu and tsc are not already calibrated */
+ WARN_ON(cpu_khz || tsc_khz);
- if (!boot_cpu_has(X86_FEATURE_TSC))
- return;
-
- cpu_khz = x86_platform.calibrate_cpu();
- tsc_khz = x86_platform.calibrate_tsc();
-
- tsc_khz = tsc_khz ? : cpu_khz;
- if (!tsc_khz)
- return;
-
- lpj = tsc_khz * 1000;
- do_div(lpj, HZ);
- loops_per_jiffy = lpj;
-}
-
-void __init tsc_init(void)
-{
- u64 lpj, cyc;
- int cpu;
-
- if (!boot_cpu_has(X86_FEATURE_TSC)) {
- setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
- return;
+ if (early) {
+ cpu_khz = x86_platform.calibrate_cpu();
+ tsc_khz = x86_platform.calibrate_tsc();
+ } else {
+ /* We should not be here with non-native cpu calibration */
+ WARN_ON(x86_platform.calibrate_cpu != native_calibrate_cpu);
+ cpu_khz = pit_hpet_ptimer_calibrate_cpu();
}
- cpu_khz = x86_platform.calibrate_cpu();
- tsc_khz = x86_platform.calibrate_tsc();
-
/*
- * Trust non-zero tsc_khz as authorative,
+ * Trust non-zero tsc_khz as authoritative,
* and use it to sanity check cpu_khz,
* which will be off if system timer is off.
*/
@@ -1383,52 +1398,78 @@
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;
- if (!tsc_khz) {
- mark_tsc_unstable("could not calculate TSC khz");
+ if (tsc_khz == 0)
+ return false;
+
+ pr_info("Detected %lu.%03lu MHz processor\n",
+ (unsigned long)cpu_khz / KHZ,
+ (unsigned long)cpu_khz % KHZ);
+
+ if (cpu_khz != tsc_khz) {
+ pr_info("Detected %lu.%03lu MHz TSC",
+ (unsigned long)tsc_khz / KHZ,
+ (unsigned long)tsc_khz % KHZ);
+ }
+ return true;
+}
+
+static unsigned long __init get_loops_per_jiffy(void)
+{
+ unsigned long lpj = tsc_khz * KHZ;
+
+ do_div(lpj, HZ);
+ return lpj;
+}
+
+static void __init tsc_enable_sched_clock(void)
+{
+ /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+ tsc_store_and_check_tsc_adjust(true);
+ cyc2ns_init_boot_cpu();
+ static_branch_enable(&__use_tsc);
+}
+
+void __init tsc_early_init(void)
+{
+ if (!boot_cpu_has(X86_FEATURE_TSC))
+ return;
+ if (!determine_cpu_tsc_frequencies(true))
+ return;
+ loops_per_jiffy = get_loops_per_jiffy();
+
+ tsc_enable_sched_clock();
+}
+
+void __init tsc_init(void)
+{
+ /*
+ * native_calibrate_cpu_early can only calibrate using methods that are
+ * available early in boot.
+ */
+ if (x86_platform.calibrate_cpu == native_calibrate_cpu_early)
+ x86_platform.calibrate_cpu = native_calibrate_cpu;
+
+ if (!boot_cpu_has(X86_FEATURE_TSC)) {
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
}
- pr_info("Detected %lu.%03lu MHz processor\n",
- (unsigned long)cpu_khz / 1000,
- (unsigned long)cpu_khz % 1000);
-
- if (cpu_khz != tsc_khz) {
- pr_info("Detected %lu.%03lu MHz TSC",
- (unsigned long)tsc_khz / 1000,
- (unsigned long)tsc_khz % 1000);
+ if (!tsc_khz) {
+ /* We failed to determine frequencies earlier, try again */
+ if (!determine_cpu_tsc_frequencies(false)) {
+ mark_tsc_unstable("could not calculate TSC khz");
+ setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
+ return;
+ }
+ tsc_enable_sched_clock();
}
- /* Sanitize TSC ADJUST before cyc2ns gets initialized */
- tsc_store_and_check_tsc_adjust(true);
-
- /*
- * Secondary CPUs do not run through tsc_init(), so set up
- * all the scale factors for all CPUs, assuming the same
- * speed as the bootup CPU. (cpufreq notifiers will fix this
- * up if their speed diverges)
- */
- cyc = rdtsc();
- for_each_possible_cpu(cpu) {
- cyc2ns_init(cpu);
- set_cyc2ns_scale(tsc_khz, cpu, cyc);
- }
-
- if (tsc_disabled > 0)
- return;
-
- /* now allow native_sched_clock() to use rdtsc */
-
- tsc_disabled = 0;
- static_branch_enable(&__use_tsc);
+ cyc2ns_init_secondary_cpus();
if (!no_sched_irq_time)
enable_sched_clock_irqtime();
- lpj = ((u64)tsc_khz * 1000);
- do_div(lpj, HZ);
- lpj_fine = lpj;
-
+ lpj_fine = get_loops_per_jiffy();
use_tsc_delay();
check_system_tsc_reliable();
@@ -1455,7 +1496,7 @@
int constant_tsc = cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC);
const struct cpumask *mask = topology_core_cpumask(cpu);
- if (tsc_disabled || !constant_tsc || !mask)
+ if (!constant_tsc || !mask)
return 0;
sibling = cpumask_any_but(mask, cpu);
diff --git a/arch/x86/kernel/tsc_msr.c b/arch/x86/kernel/tsc_msr.c
index 19afdbd..27ef714 100644
--- a/arch/x86/kernel/tsc_msr.c
+++ b/arch/x86/kernel/tsc_msr.c
@@ -1,17 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
/*
- * tsc_msr.c - TSC frequency enumeration via MSR
+ * TSC frequency enumeration via MSR
*
- * Copyright (C) 2013 Intel Corporation
+ * Copyright (C) 2013, 2018 Intel Corporation
* Author: Bin Gao <bin.gao@intel.com>
- *
- * This file is released under the GPLv2.
*/
#include <linux/kernel.h>
-#include <asm/processor.h>
-#include <asm/setup.h>
+
#include <asm/apic.h>
+#include <asm/cpu_device_id.h>
+#include <asm/intel-family.h>
+#include <asm/msr.h>
#include <asm/param.h>
+#include <asm/tsc.h>
#define MAX_NUM_FREQS 9
@@ -23,44 +25,48 @@
* field msr_plat does.
*/
struct freq_desc {
- u8 x86_family; /* CPU family */
- u8 x86_model; /* model */
u8 msr_plat; /* 1: use MSR_PLATFORM_INFO, 0: MSR_IA32_PERF_STATUS */
u32 freqs[MAX_NUM_FREQS];
};
-static struct freq_desc freq_desc_tables[] = {
- /* PNW */
- { 6, 0x27, 0, { 0, 0, 0, 0, 0, 99840, 0, 83200 } },
- /* CLV+ */
- { 6, 0x35, 0, { 0, 133200, 0, 0, 0, 99840, 0, 83200 } },
- /* TNG - Intel Atom processor Z3400 series */
- { 6, 0x4a, 1, { 0, 100000, 133300, 0, 0, 0, 0, 0 } },
- /* VLV2 - Intel Atom processor E3000, Z3600, Z3700 series */
- { 6, 0x37, 1, { 83300, 100000, 133300, 116700, 80000, 0, 0, 0 } },
- /* ANN - Intel Atom processor Z3500 series */
- { 6, 0x5a, 1, { 83300, 100000, 133300, 100000, 0, 0, 0, 0 } },
- /* AMT - Intel Atom processor X7-Z8000 and X5-Z8000 series */
- { 6, 0x4c, 1, { 83300, 100000, 133300, 116700,
- 80000, 93300, 90000, 88900, 87500 } },
+/*
+ * Penwell and Clovertrail use spread spectrum clock,
+ * so the freq number is not exactly the same as reported
+ * by MSR based on SDM.
+ */
+static const struct freq_desc freq_desc_pnw = {
+ 0, { 0, 0, 0, 0, 0, 99840, 0, 83200 }
};
-static int match_cpu(u8 family, u8 model)
-{
- int i;
+static const struct freq_desc freq_desc_clv = {
+ 0, { 0, 133200, 0, 0, 0, 99840, 0, 83200 }
+};
- for (i = 0; i < ARRAY_SIZE(freq_desc_tables); i++) {
- if ((family == freq_desc_tables[i].x86_family) &&
- (model == freq_desc_tables[i].x86_model))
- return i;
- }
+static const struct freq_desc freq_desc_byt = {
+ 1, { 83300, 100000, 133300, 116700, 80000, 0, 0, 0 }
+};
- return -1;
-}
+static const struct freq_desc freq_desc_cht = {
+ 1, { 83300, 100000, 133300, 116700, 80000, 93300, 90000, 88900, 87500 }
+};
-/* Map CPU reference clock freq ID(0-7) to CPU reference clock freq(KHz) */
-#define id_to_freq(cpu_index, freq_id) \
- (freq_desc_tables[cpu_index].freqs[freq_id])
+static const struct freq_desc freq_desc_tng = {
+ 1, { 0, 100000, 133300, 0, 0, 0, 0, 0 }
+};
+
+static const struct freq_desc freq_desc_ann = {
+ 1, { 83300, 100000, 133300, 100000, 0, 0, 0, 0 }
+};
+
+static const struct x86_cpu_id tsc_msr_cpu_ids[] = {
+ INTEL_CPU_FAM6(ATOM_PENWELL, freq_desc_pnw),
+ INTEL_CPU_FAM6(ATOM_CLOVERVIEW, freq_desc_clv),
+ INTEL_CPU_FAM6(ATOM_SILVERMONT1, freq_desc_byt),
+ INTEL_CPU_FAM6(ATOM_AIRMONT, freq_desc_cht),
+ INTEL_CPU_FAM6(ATOM_MERRIFIELD, freq_desc_tng),
+ INTEL_CPU_FAM6(ATOM_MOOREFIELD, freq_desc_ann),
+ {}
+};
/*
* MSR-based CPU/TSC frequency discovery for certain CPUs.
@@ -70,18 +76,17 @@
*/
unsigned long cpu_khz_from_msr(void)
{
- u32 lo, hi, ratio, freq_id, freq;
+ u32 lo, hi, ratio, freq;
+ const struct freq_desc *freq_desc;
+ const struct x86_cpu_id *id;
unsigned long res;
- int cpu_index;
- if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ id = x86_match_cpu(tsc_msr_cpu_ids);
+ if (!id)
return 0;
- cpu_index = match_cpu(boot_cpu_data.x86, boot_cpu_data.x86_model);
- if (cpu_index < 0)
- return 0;
-
- if (freq_desc_tables[cpu_index].msr_plat) {
+ freq_desc = (struct freq_desc *)id->driver_data;
+ if (freq_desc->msr_plat) {
rdmsr(MSR_PLATFORM_INFO, lo, hi);
ratio = (lo >> 8) & 0xff;
} else {
@@ -91,8 +96,9 @@
/* Get FSB FREQ ID */
rdmsr(MSR_FSB_FREQ, lo, hi);
- freq_id = lo & 0x7;
- freq = id_to_freq(cpu_index, freq_id);
+
+ /* Map CPU reference clock freq ID(0-7) to CPU reference clock freq(KHz) */
+ freq = freq_desc->freqs[lo & 0x7];
/* TSC frequency = maximum resolved freq * maximum resolved bus ratio */
res = freq * ratio;
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index feb28fe..26038ea 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -198,7 +198,7 @@
* whitelisted .o files which didn't get objtool generation.
*/
orc_a = cur_orc_table + (a - cur_orc_ip_table);
- return orc_a->sp_reg == ORC_REG_UNDEFINED ? -1 : 1;
+ return orc_a->sp_reg == ORC_REG_UNDEFINED && !orc_a->end ? -1 : 1;
}
#ifdef CONFIG_MODULES
@@ -352,7 +352,7 @@
bool unwind_next_frame(struct unwind_state *state)
{
- unsigned long ip_p, sp, orig_ip, prev_sp = state->sp;
+ unsigned long ip_p, sp, orig_ip = state->ip, prev_sp = state->sp;
enum stack_type prev_type = state->stack_info.type;
struct orc_entry *orc;
bool indirect = false;
@@ -363,9 +363,9 @@
/* Don't let modules unload while we're reading their ORC data. */
preempt_disable();
- /* Have we reached the end? */
+ /* End-of-stack check for user tasks: */
if (state->regs && user_mode(state->regs))
- goto done;
+ goto the_end;
/*
* Find the orc_entry associated with the text address.
@@ -374,9 +374,16 @@
* calls and calls to noreturn functions.
*/
orc = orc_find(state->signal ? state->ip : state->ip - 1);
- if (!orc || orc->sp_reg == ORC_REG_UNDEFINED)
- goto done;
- orig_ip = state->ip;
+ if (!orc)
+ goto err;
+
+ /* End-of-stack check for kernel threads: */
+ if (orc->sp_reg == ORC_REG_UNDEFINED) {
+ if (!orc->end)
+ goto err;
+
+ goto the_end;
+ }
/* Find the previous frame's stack: */
switch (orc->sp_reg) {
@@ -402,7 +409,7 @@
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg R10 at ip %pB\n",
(void *)state->ip);
- goto done;
+ goto err;
}
sp = state->regs->r10;
break;
@@ -411,7 +418,7 @@
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg R13 at ip %pB\n",
(void *)state->ip);
- goto done;
+ goto err;
}
sp = state->regs->r13;
break;
@@ -420,7 +427,7 @@
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg DI at ip %pB\n",
(void *)state->ip);
- goto done;
+ goto err;
}
sp = state->regs->di;
break;
@@ -429,7 +436,7 @@
if (!state->regs || !state->full_regs) {
orc_warn("missing regs for base reg DX at ip %pB\n",
(void *)state->ip);
- goto done;
+ goto err;
}
sp = state->regs->dx;
break;
@@ -437,12 +444,12 @@
default:
orc_warn("unknown SP base reg %d for ip %pB\n",
orc->sp_reg, (void *)state->ip);
- goto done;
+ goto err;
}
if (indirect) {
if (!deref_stack_reg(state, sp, &sp))
- goto done;
+ goto err;
}
/* Find IP, SP and possibly regs: */
@@ -451,7 +458,7 @@
ip_p = sp - sizeof(long);
if (!deref_stack_reg(state, ip_p, &state->ip))
- goto done;
+ goto err;
state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
state->ip, (void *)ip_p);
@@ -465,7 +472,7 @@
if (!deref_stack_regs(state, sp, &state->ip, &state->sp)) {
orc_warn("can't dereference registers at %p for ip %pB\n",
(void *)sp, (void *)orig_ip);
- goto done;
+ goto err;
}
state->regs = (struct pt_regs *)sp;
@@ -477,7 +484,7 @@
if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
orc_warn("can't dereference iret registers at %p for ip %pB\n",
(void *)sp, (void *)orig_ip);
- goto done;
+ goto err;
}
state->regs = (void *)sp - IRET_FRAME_OFFSET;
@@ -500,18 +507,18 @@
case ORC_REG_PREV_SP:
if (!deref_stack_reg(state, sp + orc->bp_offset, &state->bp))
- goto done;
+ goto err;
break;
case ORC_REG_BP:
if (!deref_stack_reg(state, state->bp + orc->bp_offset, &state->bp))
- goto done;
+ goto err;
break;
default:
orc_warn("unknown BP base reg %d for ip %pB\n",
orc->bp_reg, (void *)orig_ip);
- goto done;
+ goto err;
}
/* Prevent a recursive loop due to bad ORC data: */
@@ -520,13 +527,16 @@
state->sp <= prev_sp) {
orc_warn("stack going in the wrong direction? ip=%pB\n",
(void *)orig_ip);
- goto done;
+ goto err;
}
preempt_enable();
return true;
-done:
+err:
+ state->error = true;
+
+the_end:
preempt_enable();
state->stack_info.type = STACK_TYPE_UNKNOWN;
return false;
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 9d0b5af..1c03e4a 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -149,7 +149,7 @@
preempt_disable();
tsk->thread.sp0 = vm86->saved_sp0;
tsk->thread.sysenter_cs = __KERNEL_CS;
- update_sp0(tsk);
+ update_task_stack(tsk);
refresh_sysenter_cs(&tsk->thread);
vm86->saved_sp0 = 0;
preempt_enable();
@@ -374,7 +374,7 @@
refresh_sysenter_cs(&tsk->thread);
}
- update_sp0(tsk);
+ update_task_stack(tsk);
preempt_enable();
if (vm86->flags & VM86_SCREEN_BITMAP)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 5e1458f..8bde0a4 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -55,19 +55,22 @@
* so we can enable protection checks as well as retain 2MB large page
* mappings for kernel text.
*/
-#define X64_ALIGN_RODATA_BEGIN . = ALIGN(HPAGE_SIZE);
+#define X86_ALIGN_RODATA_BEGIN . = ALIGN(HPAGE_SIZE);
-#define X64_ALIGN_RODATA_END \
+#define X86_ALIGN_RODATA_END \
. = ALIGN(HPAGE_SIZE); \
- __end_rodata_hpage_align = .;
+ __end_rodata_hpage_align = .; \
+ __end_rodata_aligned = .;
#define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE);
#define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
#else
-#define X64_ALIGN_RODATA_BEGIN
-#define X64_ALIGN_RODATA_END
+#define X86_ALIGN_RODATA_BEGIN
+#define X86_ALIGN_RODATA_END \
+ . = ALIGN(PAGE_SIZE); \
+ __end_rodata_aligned = .;
#define ALIGN_ENTRY_TEXT_BEGIN
#define ALIGN_ENTRY_TEXT_END
@@ -141,9 +144,9 @@
/* .text should occupy whole number of pages */
. = ALIGN(PAGE_SIZE);
- X64_ALIGN_RODATA_BEGIN
+ X86_ALIGN_RODATA_BEGIN
RO_DATA(PAGE_SIZE)
- X64_ALIGN_RODATA_END
+ X86_ALIGN_RODATA_END
/* Data */
.data : AT(ADDR(.data) - LOAD_OFFSET) {
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 3ab8676..2792b55 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -109,7 +109,7 @@
static void default_nmi_init(void) { };
struct x86_platform_ops x86_platform __ro_after_init = {
- .calibrate_cpu = native_calibrate_cpu,
+ .calibrate_cpu = native_calibrate_cpu_early,
.calibrate_tsc = native_calibrate_tsc,
.get_wallclock = mach_get_cmos_time,
.set_wallclock = mach_set_rtc_mmss,
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b5cd846..d536d45 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1379,7 +1379,7 @@
* using swait_active() is safe.
*/
if (swait_active(q))
- swake_up(q);
+ swake_up_one(q);
if (apic_lvtt_tscdeadline(apic))
ktimer->expired_tscdeadline = ktimer->tscdeadline;
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 298ef14..3b24dc0 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -256,7 +256,7 @@
/* Copy successful. Return zero */
.L_done_memcpy_trap:
- xorq %rax, %rax
+ xorl %eax, %eax
ret
ENDPROC(__memcpy_mcsafe)
EXPORT_SYMBOL_GPL(__memcpy_mcsafe)
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2f3c919..a12afff 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -111,6 +111,8 @@
[END_OF_SPACE_NR] = { -1, NULL }
};
+#define INIT_PGD ((pgd_t *) &init_top_pgt)
+
#else /* CONFIG_X86_64 */
enum address_markers_idx {
@@ -121,6 +123,9 @@
#ifdef CONFIG_HIGHMEM
PKMAP_BASE_NR,
#endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ LDT_NR,
+#endif
CPU_ENTRY_AREA_NR,
FIXADDR_START_NR,
END_OF_SPACE_NR,
@@ -134,11 +139,16 @@
#ifdef CONFIG_HIGHMEM
[PKMAP_BASE_NR] = { 0UL, "Persistent kmap() Area" },
#endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ [LDT_NR] = { 0UL, "LDT remap" },
+#endif
[CPU_ENTRY_AREA_NR] = { 0UL, "CPU entry area" },
[FIXADDR_START_NR] = { 0UL, "Fixmap area" },
[END_OF_SPACE_NR] = { -1, NULL }
};
+#define INIT_PGD (swapper_pg_dir)
+
#endif /* !CONFIG_X86_64 */
/* Multipliers for offsets within the PTEs */
@@ -496,11 +506,7 @@
static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
bool checkwx, bool dmesg)
{
-#ifdef CONFIG_X86_64
- pgd_t *start = (pgd_t *) &init_top_pgt;
-#else
- pgd_t *start = swapper_pg_dir;
-#endif
+ pgd_t *start = INIT_PGD;
pgprotval_t prot, eff;
int i;
struct pg_state st = {};
@@ -563,12 +569,13 @@
}
EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
-static void ptdump_walk_user_pgd_level_checkwx(void)
+void ptdump_walk_user_pgd_level_checkwx(void)
{
#ifdef CONFIG_PAGE_TABLE_ISOLATION
- pgd_t *pgd = (pgd_t *) &init_top_pgt;
+ pgd_t *pgd = INIT_PGD;
- if (!static_cpu_has(X86_FEATURE_PTI))
+ if (!(__supported_pte_mask & _PAGE_NX) ||
+ !static_cpu_has(X86_FEATURE_PTI))
return;
pr_info("x86/mm: Checking user space page tables\n");
@@ -580,7 +587,6 @@
void ptdump_walk_pgd_level_checkwx(void)
{
ptdump_walk_pgd_level_core(NULL, NULL, true, false);
- ptdump_walk_user_pgd_level_checkwx();
}
static int __init pt_dump_init(void)
@@ -609,6 +615,9 @@
# endif
address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE;
+# ifdef CONFIG_MODIFY_LDT_SYSCALL
+ address_markers[LDT_NR].start_address = LDT_BASE_ADDR;
+# endif
#endif
return 0;
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6a..db1c042 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -317,8 +317,6 @@
if (!(address >= VMALLOC_START && address < VMALLOC_END))
return -1;
- WARN_ON_ONCE(in_nmi());
-
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index cee58a9..74b157a 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -773,13 +773,44 @@
}
}
+/*
+ * begin/end can be in the direct map or the "high kernel mapping"
+ * used for the kernel image only. free_init_pages() will do the
+ * right thing for either kind of address.
+ */
+void free_kernel_image_pages(void *begin, void *end)
+{
+ unsigned long begin_ul = (unsigned long)begin;
+ unsigned long end_ul = (unsigned long)end;
+ unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
+
+
+ free_init_pages("unused kernel image", begin_ul, end_ul);
+
+ /*
+ * PTI maps some of the kernel into userspace. For performance,
+ * this includes some kernel areas that do not contain secrets.
+ * Those areas might be adjacent to the parts of the kernel image
+ * being freed, which may contain secrets. Remove the "high kernel
+ * image mapping" for these freed areas, ensuring they are not even
+ * potentially vulnerable to Meltdown regardless of the specific
+ * optimizations PTI is currently using.
+ *
+ * The "noalias" prevents unmapping the direct map alias which is
+ * needed to access the freed pages.
+ *
+ * This is only valid for 64bit kernels. 32bit has only one mapping
+ * which can't be treated in this way for obvious reasons.
+ */
+ if (IS_ENABLED(CONFIG_X86_64) && cpu_feature_enabled(X86_FEATURE_PTI))
+ set_memory_np_noalias(begin_ul, len_pages);
+}
+
void __ref free_initmem(void)
{
e820__reallocate_tables();
- free_init_pages("unused kernel",
- (unsigned long)(&__init_begin),
- (unsigned long)(&__init_end));
+ free_kernel_image_pages(&__init_begin, &__init_end);
}
#ifdef CONFIG_BLK_DEV_INITRD
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a688617..dd519f3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1283,20 +1283,10 @@
set_memory_ro(start, (end-start) >> PAGE_SHIFT);
#endif
- free_init_pages("unused kernel",
- (unsigned long) __va(__pa_symbol(text_end)),
- (unsigned long) __va(__pa_symbol(rodata_start)));
- free_init_pages("unused kernel",
- (unsigned long) __va(__pa_symbol(rodata_end)),
- (unsigned long) __va(__pa_symbol(_sdata)));
+ free_kernel_image_pages((void *)text_end, (void *)rodata_start);
+ free_kernel_image_pages((void *)rodata_end, (void *)_sdata);
debug_checkwx();
-
- /*
- * Do this after all of the manipulation of the
- * kernel text page tables are complete.
- */
- pti_clone_kernel_text();
}
int kern_addr_valid(unsigned long addr)
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index 34a2a3b..b54d52a 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -61,7 +61,7 @@
eb->nid = nid;
if (emu_nid_to_phys[nid] == NUMA_NO_NODE)
- emu_nid_to_phys[nid] = nid;
+ emu_nid_to_phys[nid] = pb->nid;
pb->start += size;
if (pb->start >= pb->end) {
@@ -198,40 +198,73 @@
return end;
}
+static u64 uniform_size(u64 max_addr, u64 base, u64 hole, int nr_nodes)
+{
+ unsigned long max_pfn = PHYS_PFN(max_addr);
+ unsigned long base_pfn = PHYS_PFN(base);
+ unsigned long hole_pfns = PHYS_PFN(hole);
+
+ return PFN_PHYS((max_pfn - base_pfn - hole_pfns) / nr_nodes);
+}
+
/*
* Sets up fake nodes of `size' interleaved over physical nodes ranging from
* `addr' to `max_addr'.
*
* Returns zero on success or negative on error.
*/
-static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
+static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei,
struct numa_meminfo *pi,
- u64 addr, u64 max_addr, u64 size)
+ u64 addr, u64 max_addr, u64 size,
+ int nr_nodes, struct numa_memblk *pblk,
+ int nid)
{
nodemask_t physnode_mask = numa_nodes_parsed;
+ int i, ret, uniform = 0;
u64 min_size;
- int nid = 0;
- int i, ret;
- if (!size)
+ if ((!size && !nr_nodes) || (nr_nodes && !pblk))
return -1;
+
/*
- * The limit on emulated nodes is MAX_NUMNODES, so the size per node is
- * increased accordingly if the requested size is too small. This
- * creates a uniform distribution of node sizes across the entire
- * machine (but not necessarily over physical nodes).
+ * In the 'uniform' case split the passed in physical node by
+ * nr_nodes, in the non-uniform case, ignore the passed in
+ * physical block and try to create nodes of at least size
+ * @size.
+ *
+ * In the uniform case, split the nodes strictly by physical
+ * capacity, i.e. ignore holes. In the non-uniform case account
+ * for holes and treat @size as a minimum floor.
*/
- min_size = (max_addr - addr - mem_hole_size(addr, max_addr)) / MAX_NUMNODES;
- min_size = max(min_size, FAKE_NODE_MIN_SIZE);
- if ((min_size & FAKE_NODE_MIN_HASH_MASK) < min_size)
- min_size = (min_size + FAKE_NODE_MIN_SIZE) &
- FAKE_NODE_MIN_HASH_MASK;
+ if (!nr_nodes)
+ nr_nodes = MAX_NUMNODES;
+ else {
+ nodes_clear(physnode_mask);
+ node_set(pblk->nid, physnode_mask);
+ uniform = 1;
+ }
+
+ if (uniform) {
+ min_size = uniform_size(max_addr, addr, 0, nr_nodes);
+ size = min_size;
+ } else {
+ /*
+ * The limit on emulated nodes is MAX_NUMNODES, so the
+ * size per node is increased accordingly if the
+ * requested size is too small. This creates a uniform
+ * distribution of node sizes across the entire machine
+ * (but not necessarily over physical nodes).
+ */
+ min_size = uniform_size(max_addr, addr,
+ mem_hole_size(addr, max_addr), nr_nodes);
+ }
+ min_size = ALIGN(max(min_size, FAKE_NODE_MIN_SIZE), FAKE_NODE_MIN_SIZE);
if (size < min_size) {
pr_err("Fake node size %LuMB too small, increasing to %LuMB\n",
size >> 20, min_size >> 20);
size = min_size;
}
- size &= FAKE_NODE_MIN_HASH_MASK;
+ size = ALIGN_DOWN(size, FAKE_NODE_MIN_SIZE);
/*
* Fill physical nodes with fake nodes of size until there is no memory
@@ -248,10 +281,14 @@
node_clear(i, physnode_mask);
continue;
}
+
start = pi->blk[phys_blk].start;
limit = pi->blk[phys_blk].end;
- end = find_end_of_node(start, limit, size);
+ if (uniform)
+ end = start + size;
+ else
+ end = find_end_of_node(start, limit, size);
/*
* If there won't be at least FAKE_NODE_MIN_SIZE of
* non-reserved memory in ZONE_DMA32 for the next node,
@@ -266,7 +303,8 @@
* next node, this one must extend to the end of the
* physical node.
*/
- if (limit - end - mem_hole_size(end, limit) < size)
+ if ((limit - end - mem_hole_size(end, limit) < size)
+ && !uniform)
end = limit;
ret = emu_setup_memblk(ei, pi, nid++ % MAX_NUMNODES,
@@ -276,7 +314,15 @@
return ret;
}
}
- return 0;
+ return nid;
+}
+
+static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
+ struct numa_meminfo *pi,
+ u64 addr, u64 max_addr, u64 size)
+{
+ return split_nodes_size_interleave_uniform(ei, pi, addr, max_addr, size,
+ 0, NULL, NUMA_NO_NODE);
}
int __init setup_emu2phys_nid(int *dfl_phys_nid)
@@ -346,7 +392,28 @@
* the fixed node size. Otherwise, if it is just a single number N,
* split the system RAM into N fake nodes.
*/
- if (strchr(emu_cmdline, 'M') || strchr(emu_cmdline, 'G')) {
+ if (strchr(emu_cmdline, 'U')) {
+ nodemask_t physnode_mask = numa_nodes_parsed;
+ unsigned long n;
+ int nid = 0;
+
+ n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
+ ret = -1;
+ for_each_node_mask(i, physnode_mask) {
+ ret = split_nodes_size_interleave_uniform(&ei, &pi,
+ pi.blk[i].start, pi.blk[i].end, 0,
+ n, &pi.blk[i], nid);
+ if (ret < 0)
+ break;
+ if (ret < n) {
+ pr_info("%s: phys: %d only got %d of %ld nodes, failing\n",
+ __func__, i, ret, n);
+ ret = -1;
+ break;
+ }
+ nid = ret;
+ }
+ } else if (strchr(emu_cmdline, 'M') || strchr(emu_cmdline, 'G')) {
u64 size;
size = memparse(emu_cmdline, &emu_cmdline);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 3bded76e..0a74996 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -53,6 +53,7 @@
#define CPA_FLUSHTLB 1
#define CPA_ARRAY 2
#define CPA_PAGES_ARRAY 4
+#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
#ifdef CONFIG_PROC_FS
static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -1486,6 +1487,9 @@
/* No alias checking for _NX bit modifications */
checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
+ /* Has caller explicitly disabled alias checking? */
+ if (in_flag & CPA_NO_CHECK_ALIAS)
+ checkalias = 0;
ret = __change_page_attr_set_clr(&cpa, checkalias);
@@ -1772,6 +1776,15 @@
return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
}
+int set_memory_np_noalias(unsigned long addr, int numpages)
+{
+ int cpa_flags = CPA_NO_CHECK_ALIAS;
+
+ return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
+ __pgprot(_PAGE_PRESENT), 0,
+ cpa_flags, NULL);
+}
+
int set_memory_4k(unsigned long addr, int numpages)
{
return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
@@ -1784,6 +1797,12 @@
__pgprot(_PAGE_GLOBAL), 0);
}
+int set_memory_global(unsigned long addr, int numpages)
+{
+ return change_page_attr_set(&addr, numpages,
+ __pgprot(_PAGE_GLOBAL), 0);
+}
+
static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
struct cpa_data cpa;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 47b5951..3ef095c 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -182,6 +182,14 @@
*/
#define PREALLOCATED_PMDS UNSHARED_PTRS_PER_PGD
+/*
+ * We allocate separate PMDs for the kernel part of the user page-table
+ * when PTI is enabled. We need them to map the per-process LDT into the
+ * user-space page-table.
+ */
+#define PREALLOCATED_USER_PMDS (static_cpu_has(X86_FEATURE_PTI) ? \
+ KERNEL_PGD_PTRS : 0)
+
void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
{
paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
@@ -202,14 +210,14 @@
/* No need to prepopulate any pagetable entries in non-PAE modes. */
#define PREALLOCATED_PMDS 0
-
+#define PREALLOCATED_USER_PMDS 0
#endif /* CONFIG_X86_PAE */
-static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
{
int i;
- for(i = 0; i < PREALLOCATED_PMDS; i++)
+ for (i = 0; i < count; i++)
if (pmds[i]) {
pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
free_page((unsigned long)pmds[i]);
@@ -217,7 +225,7 @@
}
}
-static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
{
int i;
bool failed = false;
@@ -226,7 +234,7 @@
if (mm == &init_mm)
gfp &= ~__GFP_ACCOUNT;
- for(i = 0; i < PREALLOCATED_PMDS; i++) {
+ for (i = 0; i < count; i++) {
pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
if (!pmd)
failed = true;
@@ -241,7 +249,7 @@
}
if (failed) {
- free_pmds(mm, pmds);
+ free_pmds(mm, pmds, count);
return -ENOMEM;
}
@@ -254,23 +262,38 @@
* preallocate which never got a corresponding vma will need to be
* freed manually.
*/
+static void mop_up_one_pmd(struct mm_struct *mm, pgd_t *pgdp)
+{
+ pgd_t pgd = *pgdp;
+
+ if (pgd_val(pgd) != 0) {
+ pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+
+ *pgdp = native_make_pgd(0);
+
+ paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
+ pmd_free(mm, pmd);
+ mm_dec_nr_pmds(mm);
+ }
+}
+
static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
{
int i;
- for(i = 0; i < PREALLOCATED_PMDS; i++) {
- pgd_t pgd = pgdp[i];
+ for (i = 0; i < PREALLOCATED_PMDS; i++)
+ mop_up_one_pmd(mm, &pgdp[i]);
- if (pgd_val(pgd) != 0) {
- pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
- pgdp[i] = native_make_pgd(0);
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
- paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
- pmd_free(mm, pmd);
- mm_dec_nr_pmds(mm);
- }
- }
+ pgdp = kernel_to_user_pgdp(pgdp);
+
+ for (i = 0; i < PREALLOCATED_USER_PMDS; i++)
+ mop_up_one_pmd(mm, &pgdp[i + KERNEL_PGD_BOUNDARY]);
+#endif
}
static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
@@ -296,6 +319,38 @@
}
}
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+ pgd_t *k_pgd, pmd_t *pmds[])
+{
+ pgd_t *s_pgd = kernel_to_user_pgdp(swapper_pg_dir);
+ pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+ p4d_t *u_p4d;
+ pud_t *u_pud;
+ int i;
+
+ u_p4d = p4d_offset(u_pgd, 0);
+ u_pud = pud_offset(u_p4d, 0);
+
+ s_pgd += KERNEL_PGD_BOUNDARY;
+ u_pud += KERNEL_PGD_BOUNDARY;
+
+ for (i = 0; i < PREALLOCATED_USER_PMDS; i++, u_pud++, s_pgd++) {
+ pmd_t *pmd = pmds[i];
+
+ memcpy(pmd, (pmd_t *)pgd_page_vaddr(*s_pgd),
+ sizeof(pmd_t) * PTRS_PER_PMD);
+
+ pud_populate(mm, u_pud, pmd);
+ }
+
+}
+#else
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+ pgd_t *k_pgd, pmd_t *pmds[])
+{
+}
+#endif
/*
* Xen paravirt assumes pgd table should be in one page. 64 bit kernel also
* assumes that pgd should be in one page.
@@ -329,9 +384,6 @@
*/
pgd_cache = kmem_cache_create("pgd_cache", PGD_SIZE, PGD_ALIGN,
SLAB_PANIC, NULL);
- if (!pgd_cache)
- return -ENOMEM;
-
return 0;
}
core_initcall(pgd_cache_init);
@@ -343,7 +395,8 @@
* We allocate one page for pgd.
*/
if (!SHARED_KERNEL_PMD)
- return (pgd_t *)__get_free_page(PGALLOC_GFP);
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP,
+ PGD_ALLOCATION_ORDER);
/*
* Now PAE kernel is not running as a Xen domain. We can allocate
@@ -355,7 +408,7 @@
static inline void _pgd_free(pgd_t *pgd)
{
if (!SHARED_KERNEL_PMD)
- free_page((unsigned long)pgd);
+ free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
else
kmem_cache_free(pgd_cache, pgd);
}
@@ -375,6 +428,7 @@
pgd_t *pgd_alloc(struct mm_struct *mm)
{
pgd_t *pgd;
+ pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
pmd_t *pmds[PREALLOCATED_PMDS];
pgd = _pgd_alloc();
@@ -384,12 +438,15 @@
mm->pgd = pgd;
- if (preallocate_pmds(mm, pmds) != 0)
+ if (preallocate_pmds(mm, pmds, PREALLOCATED_PMDS) != 0)
goto out_free_pgd;
- if (paravirt_pgd_alloc(mm) != 0)
+ if (preallocate_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS) != 0)
goto out_free_pmds;
+ if (paravirt_pgd_alloc(mm) != 0)
+ goto out_free_user_pmds;
+
/*
* Make sure that pre-populating the pmds is atomic with
* respect to anything walking the pgd_list, so that they
@@ -399,13 +456,16 @@
pgd_ctor(mm, pgd);
pgd_prepopulate_pmd(mm, pgd, pmds);
+ pgd_prepopulate_user_pmd(mm, pgd, u_pmds);
spin_unlock(&pgd_lock);
return pgd;
+out_free_user_pmds:
+ free_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS);
out_free_pmds:
- free_pmds(mm, pmds);
+ free_pmds(mm, pmds, PREALLOCATED_PMDS);
out_free_pgd:
_pgd_free(pgd);
out:
@@ -719,28 +779,50 @@
return 0;
}
+#ifdef CONFIG_X86_64
/**
* pud_free_pmd_page - Clear pud entry and free pmd page.
* @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i]))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -749,11 +831,12 @@
/**
* pmd_free_pte_page - Clear pmd entry and free pte page.
* @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
pte_t *pte;
@@ -762,8 +845,30 @@
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
}
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+ return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+ return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 4d418e7..d58b4ab 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -54,6 +54,16 @@
#define __GFP_NOTRACK 0
#endif
+/*
+ * Define the page-table levels we clone for user-space on 32
+ * and 64 bit.
+ */
+#ifdef CONFIG_X86_64
+#define PTI_LEVEL_KERNEL_IMAGE PTI_CLONE_PMD
+#else
+#define PTI_LEVEL_KERNEL_IMAGE PTI_CLONE_PTE
+#endif
+
static void __init pti_print_if_insecure(const char *reason)
{
if (boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
@@ -117,7 +127,7 @@
setup_force_cpu_cap(X86_FEATURE_PTI);
}
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
{
/*
* Changes to the high (kernel) portion of the kernelmode page
@@ -176,7 +186,7 @@
if (pgd_none(*pgd)) {
unsigned long new_p4d_page = __get_free_page(gfp);
- if (!new_p4d_page)
+ if (WARN_ON_ONCE(!new_p4d_page))
return NULL;
set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
@@ -195,13 +205,17 @@
static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
{
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
- p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
+ p4d_t *p4d;
pud_t *pud;
+ p4d = pti_user_pagetable_walk_p4d(address);
+ if (!p4d)
+ return NULL;
+
BUILD_BUG_ON(p4d_large(*p4d) != 0);
if (p4d_none(*p4d)) {
unsigned long new_pud_page = __get_free_page(gfp);
- if (!new_pud_page)
+ if (WARN_ON_ONCE(!new_pud_page))
return NULL;
set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
@@ -215,7 +229,7 @@
}
if (pud_none(*pud)) {
unsigned long new_pmd_page = __get_free_page(gfp);
- if (!new_pmd_page)
+ if (WARN_ON_ONCE(!new_pmd_page))
return NULL;
set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
@@ -224,7 +238,6 @@
return pmd_offset(pud, address);
}
-#ifdef CONFIG_X86_VSYSCALL_EMULATION
/*
* Walk the shadow copy of the page tables (optionally) trying to allocate
* page table pages on the way down. Does not support large pages.
@@ -237,9 +250,13 @@
static __init pte_t *pti_user_pagetable_walk_pte(unsigned long address)
{
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
- pmd_t *pmd = pti_user_pagetable_walk_pmd(address);
+ pmd_t *pmd;
pte_t *pte;
+ pmd = pti_user_pagetable_walk_pmd(address);
+ if (!pmd)
+ return NULL;
+
/* We can't do anything sensible if we hit a large mapping. */
if (pmd_large(*pmd)) {
WARN_ON(1);
@@ -262,6 +279,7 @@
return pte;
}
+#ifdef CONFIG_X86_VSYSCALL_EMULATION
static void __init pti_setup_vsyscall(void)
{
pte_t *pte, *target_pte;
@@ -282,8 +300,14 @@
static void __init pti_setup_vsyscall(void) { }
#endif
+enum pti_clone_level {
+ PTI_CLONE_PMD,
+ PTI_CLONE_PTE,
+};
+
static void
-pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
+pti_clone_pgtable(unsigned long start, unsigned long end,
+ enum pti_clone_level level)
{
unsigned long addr;
@@ -291,59 +315,105 @@
* Clone the populated PMDs which cover start to end. These PMD areas
* can have holes.
*/
- for (addr = start; addr < end; addr += PMD_SIZE) {
+ for (addr = start; addr < end;) {
+ pte_t *pte, *target_pte;
pmd_t *pmd, *target_pmd;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
+ /* Overflow check */
+ if (addr < start)
+ break;
+
pgd = pgd_offset_k(addr);
if (WARN_ON(pgd_none(*pgd)))
return;
p4d = p4d_offset(pgd, addr);
if (WARN_ON(p4d_none(*p4d)))
return;
+
pud = pud_offset(p4d, addr);
- if (pud_none(*pud))
+ if (pud_none(*pud)) {
+ addr += PUD_SIZE;
continue;
+ }
+
pmd = pmd_offset(pud, addr);
- if (pmd_none(*pmd))
+ if (pmd_none(*pmd)) {
+ addr += PMD_SIZE;
continue;
+ }
- target_pmd = pti_user_pagetable_walk_pmd(addr);
- if (WARN_ON(!target_pmd))
- return;
+ if (pmd_large(*pmd) || level == PTI_CLONE_PMD) {
+ target_pmd = pti_user_pagetable_walk_pmd(addr);
+ if (WARN_ON(!target_pmd))
+ return;
- /*
- * Only clone present PMDs. This ensures only setting
- * _PAGE_GLOBAL on present PMDs. This should only be
- * called on well-known addresses anyway, so a non-
- * present PMD would be a surprise.
- */
- if (WARN_ON(!(pmd_flags(*pmd) & _PAGE_PRESENT)))
- return;
+ /*
+ * Only clone present PMDs. This ensures only setting
+ * _PAGE_GLOBAL on present PMDs. This should only be
+ * called on well-known addresses anyway, so a non-
+ * present PMD would be a surprise.
+ */
+ if (WARN_ON(!(pmd_flags(*pmd) & _PAGE_PRESENT)))
+ return;
- /*
- * Setting 'target_pmd' below creates a mapping in both
- * the user and kernel page tables. It is effectively
- * global, so set it as global in both copies. Note:
- * the X86_FEATURE_PGE check is not _required_ because
- * the CPU ignores _PAGE_GLOBAL when PGE is not
- * supported. The check keeps consistentency with
- * code that only set this bit when supported.
- */
- if (boot_cpu_has(X86_FEATURE_PGE))
- *pmd = pmd_set_flags(*pmd, _PAGE_GLOBAL);
+ /*
+ * Setting 'target_pmd' below creates a mapping in both
+ * the user and kernel page tables. It is effectively
+ * global, so set it as global in both copies. Note:
+ * the X86_FEATURE_PGE check is not _required_ because
+ * the CPU ignores _PAGE_GLOBAL when PGE is not
+ * supported. The check keeps consistentency with
+ * code that only set this bit when supported.
+ */
+ if (boot_cpu_has(X86_FEATURE_PGE))
+ *pmd = pmd_set_flags(*pmd, _PAGE_GLOBAL);
- /*
- * Copy the PMD. That is, the kernelmode and usermode
- * tables will share the last-level page tables of this
- * address range
- */
- *target_pmd = pmd_clear_flags(*pmd, clear);
+ /*
+ * Copy the PMD. That is, the kernelmode and usermode
+ * tables will share the last-level page tables of this
+ * address range
+ */
+ *target_pmd = *pmd;
+
+ addr += PMD_SIZE;
+
+ } else if (level == PTI_CLONE_PTE) {
+
+ /* Walk the page-table down to the pte level */
+ pte = pte_offset_kernel(pmd, addr);
+ if (pte_none(*pte)) {
+ addr += PAGE_SIZE;
+ continue;
+ }
+
+ /* Only clone present PTEs */
+ if (WARN_ON(!(pte_flags(*pte) & _PAGE_PRESENT)))
+ return;
+
+ /* Allocate PTE in the user page-table */
+ target_pte = pti_user_pagetable_walk_pte(addr);
+ if (WARN_ON(!target_pte))
+ return;
+
+ /* Set GLOBAL bit in both PTEs */
+ if (boot_cpu_has(X86_FEATURE_PGE))
+ *pte = pte_set_flags(*pte, _PAGE_GLOBAL);
+
+ /* Clone the PTE */
+ *target_pte = *pte;
+
+ addr += PAGE_SIZE;
+
+ } else {
+ BUG();
+ }
}
}
+#ifdef CONFIG_X86_64
/*
* Clone a single p4d (i.e. a top-level entry on 4-level systems and a
* next-level entry on 5-level systems.
@@ -354,6 +424,9 @@
pgd_t *kernel_pgd;
user_p4d = pti_user_pagetable_walk_p4d(addr);
+ if (!user_p4d)
+ return;
+
kernel_pgd = pgd_offset_k(addr);
kernel_p4d = p4d_offset(kernel_pgd, addr);
*user_p4d = *kernel_p4d;
@@ -367,6 +440,25 @@
pti_clone_p4d(CPU_ENTRY_AREA_BASE);
}
+#else /* CONFIG_X86_64 */
+
+/*
+ * On 32 bit PAE systems with 1GB of Kernel address space there is only
+ * one pgd/p4d for the whole kernel. Cloning that would map the whole
+ * address space into the user page-tables, making PTI useless. So clone
+ * the page-table on the PMD level to prevent that.
+ */
+static void __init pti_clone_user_shared(void)
+{
+ unsigned long start, end;
+
+ start = CPU_ENTRY_AREA_BASE;
+ end = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
+
+ pti_clone_pgtable(start, end, PTI_CLONE_PMD);
+}
+#endif /* CONFIG_X86_64 */
+
/*
* Clone the ESPFIX P4D into the user space visible page table
*/
@@ -380,11 +472,11 @@
/*
* Clone the populated PMDs of the entry and irqentry text and force it RO.
*/
-static void __init pti_clone_entry_text(void)
+static void pti_clone_entry_text(void)
{
- pti_clone_pmds((unsigned long) __entry_text_start,
- (unsigned long) __irqentry_text_end,
- _PAGE_RW);
+ pti_clone_pgtable((unsigned long) __entry_text_start,
+ (unsigned long) __irqentry_text_end,
+ PTI_CLONE_PMD);
}
/*
@@ -435,10 +527,17 @@
}
/*
+ * This is the only user for these and it is not arch-generic
+ * like the other set_memory.h functions. Just extern them.
+ */
+extern int set_memory_nonglobal(unsigned long addr, int numpages);
+extern int set_memory_global(unsigned long addr, int numpages);
+
+/*
* For some configurations, map all of kernel text into the user page
* tables. This reduces TLB misses, especially on non-PCID systems.
*/
-void pti_clone_kernel_text(void)
+static void pti_clone_kernel_text(void)
{
/*
* rodata is part of the kernel image and is normally
@@ -446,7 +545,8 @@
* clone the areas past rodata, they might contain secrets.
*/
unsigned long start = PFN_ALIGN(_text);
- unsigned long end = (unsigned long)__end_rodata_hpage_align;
+ unsigned long end_clone = (unsigned long)__end_rodata_aligned;
+ unsigned long end_global = PFN_ALIGN((unsigned long)__stop___ex_table);
if (!pti_kernel_image_global_ok())
return;
@@ -458,14 +558,18 @@
* pti_set_kernel_image_nonglobal() did to clear the
* global bit.
*/
- pti_clone_pmds(start, end, _PAGE_RW);
+ pti_clone_pgtable(start, end_clone, PTI_LEVEL_KERNEL_IMAGE);
+
+ /*
+ * pti_clone_pgtable() will set the global bit in any PMDs
+ * that it clones, but we also need to get any PTEs in
+ * the last level for areas that are not huge-page-aligned.
+ */
+
+ /* Set the global bit for normal non-__init kernel text: */
+ set_memory_global(start, (end_global - start) >> PAGE_SHIFT);
}
-/*
- * This is the only user for it and it is not arch-generic like
- * the other set_memory.h functions. Just extern it.
- */
-extern int set_memory_nonglobal(unsigned long addr, int numpages);
void pti_set_kernel_image_nonglobal(void)
{
/*
@@ -477,9 +581,11 @@
unsigned long start = PFN_ALIGN(_text);
unsigned long end = ALIGN((unsigned long)_end, PMD_PAGE_SIZE);
- if (pti_kernel_image_global_ok())
- return;
-
+ /*
+ * This clears _PAGE_GLOBAL from the entire kernel image.
+ * pti_clone_kernel_text() map put _PAGE_GLOBAL back for
+ * areas that are mapped to userspace.
+ */
set_memory_nonglobal(start, (end - start) >> PAGE_SHIFT);
}
@@ -493,6 +599,28 @@
pr_info("enabled\n");
+#ifdef CONFIG_X86_32
+ /*
+ * We check for X86_FEATURE_PCID here. But the init-code will
+ * clear the feature flag on 32 bit because the feature is not
+ * supported on 32 bit anyway. To print the warning we need to
+ * check with cpuid directly again.
+ */
+ if (cpuid_ecx(0x1) & BIT(17)) {
+ /* Use printk to work around pr_fmt() */
+ printk(KERN_WARNING "\n");
+ printk(KERN_WARNING "************************************************************\n");
+ printk(KERN_WARNING "** WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! **\n");
+ printk(KERN_WARNING "** **\n");
+ printk(KERN_WARNING "** You are using 32-bit PTI on a 64-bit PCID-capable CPU. **\n");
+ printk(KERN_WARNING "** Your performance will increase dramatically if you **\n");
+ printk(KERN_WARNING "** switch to a 64-bit kernel! **\n");
+ printk(KERN_WARNING "** **\n");
+ printk(KERN_WARNING "** WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! **\n");
+ printk(KERN_WARNING "************************************************************\n");
+ }
+#endif
+
pti_clone_user_shared();
/* Undo all global bits from the init pagetables in head_64.S: */
@@ -502,3 +630,22 @@
pti_setup_espfix64();
pti_setup_vsyscall();
}
+
+/*
+ * Finalize the kernel mappings in the userspace page-table. Some of the
+ * mappings for the kernel image might have changed since pti_init()
+ * cloned them. This is because parts of the kernel image have been
+ * mapped RO and/or NX. These changes need to be cloned again to the
+ * userspace page-table.
+ */
+void pti_finalize(void)
+{
+ /*
+ * We need to clone everything (again) that maps parts of the
+ * kernel image.
+ */
+ pti_clone_entry_text();
+ pti_clone_kernel_text();
+
+ debug_checkwx_user();
+}
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 6eb1f34..752dbf4 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -7,6 +7,7 @@
#include <linux/export.h>
#include <linux/cpu.h>
#include <linux/debugfs.h>
+#include <linux/gfp.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
@@ -35,7 +36,7 @@
* necessary invalidation by clearing out the 'ctx_id' which
* forces a TLB flush when the context is loaded.
*/
-void clear_asid_other(void)
+static void clear_asid_other(void)
{
u16 asid;
@@ -185,8 +186,11 @@
{
struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+ bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy);
unsigned cpu = smp_processor_id();
u64 next_tlb_gen;
+ bool need_flush;
+ u16 new_asid;
/*
* NB: The scheduler will call us with prev == next when switching
@@ -240,20 +244,41 @@
next->context.ctx_id);
/*
- * We don't currently support having a real mm loaded without
- * our cpu set in mm_cpumask(). We have all the bookkeeping
- * in place to figure out whether we would need to flush
- * if our cpu were cleared in mm_cpumask(), but we don't
- * currently use it.
+ * Even in lazy TLB mode, the CPU should stay set in the
+ * mm_cpumask. The TLB shootdown code can figure out from
+ * from cpu_tlbstate.is_lazy whether or not to send an IPI.
*/
if (WARN_ON_ONCE(real_prev != &init_mm &&
!cpumask_test_cpu(cpu, mm_cpumask(next))))
cpumask_set_cpu(cpu, mm_cpumask(next));
- return;
+ /*
+ * If the CPU is not in lazy TLB mode, we are just switching
+ * from one thread in a process to another thread in the same
+ * process. No TLB flush required.
+ */
+ if (!was_lazy)
+ return;
+
+ /*
+ * Read the tlb_gen to check whether a flush is needed.
+ * If the TLB is up to date, just use it.
+ * The barrier synchronizes with the tlb_gen increment in
+ * the TLB shootdown code.
+ */
+ smp_mb();
+ next_tlb_gen = atomic64_read(&next->context.tlb_gen);
+ if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
+ next_tlb_gen)
+ return;
+
+ /*
+ * TLB contents went out of date while we were in lazy
+ * mode. Fall through to the TLB switching code below.
+ */
+ new_asid = prev_asid;
+ need_flush = true;
} else {
- u16 new_asid;
- bool need_flush;
u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
/*
@@ -285,53 +310,60 @@
sync_current_stack_to_mm(next);
}
- /* Stop remote flushes for the previous mm */
- VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, mm_cpumask(real_prev)) &&
- real_prev != &init_mm);
- cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
+ /*
+ * Stop remote flushes for the previous mm.
+ * Skip kernel threads; we never send init_mm TLB flushing IPIs,
+ * but the bitmap manipulation can cause cache line contention.
+ */
+ if (real_prev != &init_mm) {
+ VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu,
+ mm_cpumask(real_prev)));
+ cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
+ }
/*
* Start remote flushes and then read tlb_gen.
*/
- cpumask_set_cpu(cpu, mm_cpumask(next));
+ if (next != &init_mm)
+ cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
+ }
- if (need_flush) {
- this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
- this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
- load_new_mm_cr3(next->pgd, new_asid, true);
-
- /*
- * NB: This gets called via leave_mm() in the idle path
- * where RCU functions differently. Tracing normally
- * uses RCU, so we need to use the _rcuidle variant.
- *
- * (There is no good reason for this. The idle code should
- * be rearranged to call this before rcu_idle_enter().)
- */
- trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
- } else {
- /* The new ASID is already up to date. */
- load_new_mm_cr3(next->pgd, new_asid, false);
-
- /* See above wrt _rcuidle. */
- trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
- }
+ if (need_flush) {
+ this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
+ this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
+ load_new_mm_cr3(next->pgd, new_asid, true);
/*
- * Record last user mm's context id, so we can avoid
- * flushing branch buffer with IBPB if we switch back
- * to the same user.
+ * NB: This gets called via leave_mm() in the idle path
+ * where RCU functions differently. Tracing normally
+ * uses RCU, so we need to use the _rcuidle variant.
+ *
+ * (There is no good reason for this. The idle code should
+ * be rearranged to call this before rcu_idle_enter().)
*/
- if (next != &init_mm)
- this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+ trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
+ } else {
+ /* The new ASID is already up to date. */
+ load_new_mm_cr3(next->pgd, new_asid, false);
- this_cpu_write(cpu_tlbstate.loaded_mm, next);
- this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
+ /* See above wrt _rcuidle. */
+ trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
}
+ /*
+ * Record last user mm's context id, so we can avoid
+ * flushing branch buffer with IBPB if we switch back
+ * to the same user.
+ */
+ if (next != &init_mm)
+ this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+
+ this_cpu_write(cpu_tlbstate.loaded_mm, next);
+ this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
+
load_mm_cr4(next);
switch_ldt(real_prev, next);
}
@@ -354,20 +386,7 @@
if (this_cpu_read(cpu_tlbstate.loaded_mm) == &init_mm)
return;
- if (tlb_defer_switch_to_init_mm()) {
- /*
- * There's a significant optimization that may be possible
- * here. We have accurate enough TLB flush tracking that we
- * don't need to maintain coherence of TLB per se when we're
- * lazy. We do, however, need to maintain coherence of
- * paging-structure caches. We could, in principle, leave our
- * old mm loaded and only switch to init_mm when
- * tlb_remove_page() happens.
- */
- this_cpu_write(cpu_tlbstate.is_lazy, true);
- } else {
- switch_mm(NULL, &init_mm, NULL);
- }
+ this_cpu_write(cpu_tlbstate.is_lazy, true);
}
/*
@@ -454,6 +473,9 @@
* paging-structure cache to avoid speculatively reading
* garbage into our TLB. Since switching to init_mm is barely
* slower than a minimal flush, just switch to init_mm.
+ *
+ * This should be rare, with native_flush_tlb_others skipping
+ * IPIs to lazy TLB mode CPUs.
*/
switch_mm_irqs_off(NULL, &init_mm, NULL);
return;
@@ -560,6 +582,9 @@
void native_flush_tlb_others(const struct cpumask *cpumask,
const struct flush_tlb_info *info)
{
+ cpumask_var_t lazymask;
+ unsigned int cpu;
+
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
if (info->end == TLB_FLUSH_ALL)
trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL);
@@ -583,8 +608,6 @@
* that UV should be updated so that smp_call_function_many(),
* etc, are optimal on UV.
*/
- unsigned int cpu;
-
cpu = smp_processor_id();
cpumask = uv_flush_tlb_others(cpumask, info);
if (cpumask)
@@ -592,8 +615,29 @@
(void *)info, 1);
return;
}
- smp_call_function_many(cpumask, flush_tlb_func_remote,
+
+ /*
+ * A temporary cpumask is used in order to skip sending IPIs
+ * to CPUs in lazy TLB state, while keeping them in mm_cpumask(mm).
+ * If the allocation fails, simply IPI every CPU in mm_cpumask.
+ */
+ if (!alloc_cpumask_var(&lazymask, GFP_ATOMIC)) {
+ smp_call_function_many(cpumask, flush_tlb_func_remote,
(void *)info, 1);
+ return;
+ }
+
+ cpumask_copy(lazymask, cpumask);
+
+ for_each_cpu(cpu, lazymask) {
+ if (per_cpu(cpu_tlbstate.is_lazy, cpu))
+ cpumask_clear_cpu(cpu, lazymask);
+ }
+
+ smp_call_function_many(lazymask, flush_tlb_func_remote,
+ (void *)info, 1);
+
+ free_cpumask_var(lazymask);
}
/*
@@ -646,6 +690,68 @@
put_cpu();
}
+void tlb_flush_remove_tables_local(void *arg)
+{
+ struct mm_struct *mm = arg;
+
+ if (this_cpu_read(cpu_tlbstate.loaded_mm) == mm &&
+ this_cpu_read(cpu_tlbstate.is_lazy)) {
+ /*
+ * We're in lazy mode. We need to at least flush our
+ * paging-structure cache to avoid speculatively reading
+ * garbage into our TLB. Since switching to init_mm is barely
+ * slower than a minimal flush, just switch to init_mm.
+ */
+ switch_mm_irqs_off(NULL, &init_mm, NULL);
+ }
+}
+
+static void mm_fill_lazy_tlb_cpu_mask(struct mm_struct *mm,
+ struct cpumask *lazy_cpus)
+{
+ int cpu;
+
+ for_each_cpu(cpu, mm_cpumask(mm)) {
+ if (!per_cpu(cpu_tlbstate.is_lazy, cpu))
+ cpumask_set_cpu(cpu, lazy_cpus);
+ }
+}
+
+void tlb_flush_remove_tables(struct mm_struct *mm)
+{
+ int cpu = get_cpu();
+ cpumask_var_t lazy_cpus;
+
+ if (cpumask_any_but(mm_cpumask(mm), cpu) >= nr_cpu_ids) {
+ put_cpu();
+ return;
+ }
+
+ if (!zalloc_cpumask_var(&lazy_cpus, GFP_ATOMIC)) {
+ /*
+ * If the cpumask allocation fails, do a brute force flush
+ * on all the CPUs that have this mm loaded.
+ */
+ smp_call_function_many(mm_cpumask(mm),
+ tlb_flush_remove_tables_local, (void *)mm, 1);
+ put_cpu();
+ return;
+ }
+
+ /*
+ * CPUs with !is_lazy either received a TLB flush IPI while the user
+ * pages in this address range were unmapped, or have context switched
+ * and reloaded %CR3 since then.
+ *
+ * Shootdown IPIs at page table freeing time only need to be sent to
+ * CPUs that may have out of date TLB contents.
+ */
+ mm_fill_lazy_tlb_cpu_mask(mm, lazy_cpus);
+ smp_call_function_many(lazy_cpus,
+ tlb_flush_remove_tables_local, (void *)mm, 1);
+ free_cpumask_var(lazy_cpus);
+ put_cpu();
+}
static void do_flush_tlb_all(void *info)
{
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 5f2eb32..ee5d08f 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -636,6 +636,8 @@
#ifdef CONFIG_EFI_MIXED
extern efi_status_t efi64_thunk(u32, ...);
+static DEFINE_SPINLOCK(efi_runtime_lock);
+
#define runtime_service32(func) \
({ \
u32 table = (u32)(unsigned long)efi.systab; \
@@ -657,17 +659,14 @@
#define efi_thunk(f, ...) \
({ \
efi_status_t __s; \
- unsigned long __flags; \
u32 __func; \
\
- local_irq_save(__flags); \
arch_efi_call_virt_setup(); \
\
__func = runtime_service32(f); \
__s = efi64_thunk(__func, __VA_ARGS__); \
\
arch_efi_call_virt_teardown(); \
- local_irq_restore(__flags); \
\
__s; \
})
@@ -702,14 +701,17 @@
{
efi_status_t status;
u32 phys_tm, phys_tc;
+ unsigned long flags;
spin_lock(&rtc_lock);
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_tm = virt_to_phys_or_null(tm);
phys_tc = virt_to_phys_or_null(tc);
status = efi_thunk(get_time, phys_tm, phys_tc);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
spin_unlock(&rtc_lock);
return status;
@@ -719,13 +721,16 @@
{
efi_status_t status;
u32 phys_tm;
+ unsigned long flags;
spin_lock(&rtc_lock);
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_tm = virt_to_phys_or_null(tm);
status = efi_thunk(set_time, phys_tm);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
spin_unlock(&rtc_lock);
return status;
@@ -737,8 +742,10 @@
{
efi_status_t status;
u32 phys_enabled, phys_pending, phys_tm;
+ unsigned long flags;
spin_lock(&rtc_lock);
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_enabled = virt_to_phys_or_null(enabled);
phys_pending = virt_to_phys_or_null(pending);
@@ -747,6 +754,7 @@
status = efi_thunk(get_wakeup_time, phys_enabled,
phys_pending, phys_tm);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
spin_unlock(&rtc_lock);
return status;
@@ -757,13 +765,16 @@
{
efi_status_t status;
u32 phys_tm;
+ unsigned long flags;
spin_lock(&rtc_lock);
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_tm = virt_to_phys_or_null(tm);
status = efi_thunk(set_wakeup_time, enabled, phys_tm);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
spin_unlock(&rtc_lock);
return status;
@@ -781,6 +792,9 @@
efi_status_t status;
u32 phys_name, phys_vendor, phys_attr;
u32 phys_data_size, phys_data;
+ unsigned long flags;
+
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_data_size = virt_to_phys_or_null(data_size);
phys_vendor = virt_to_phys_or_null(vendor);
@@ -791,6 +805,8 @@
status = efi_thunk(get_variable, phys_name, phys_vendor,
phys_attr, phys_data_size, phys_data);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
+
return status;
}
@@ -800,6 +816,9 @@
{
u32 phys_name, phys_vendor, phys_data;
efi_status_t status;
+ unsigned long flags;
+
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_name = virt_to_phys_or_null_size(name, efi_name_size(name));
phys_vendor = virt_to_phys_or_null(vendor);
@@ -809,6 +828,33 @@
status = efi_thunk(set_variable, phys_name, phys_vendor,
attr, data_size, phys_data);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
+
+ return status;
+}
+
+static efi_status_t
+efi_thunk_set_variable_nonblocking(efi_char16_t *name, efi_guid_t *vendor,
+ u32 attr, unsigned long data_size,
+ void *data)
+{
+ u32 phys_name, phys_vendor, phys_data;
+ efi_status_t status;
+ unsigned long flags;
+
+ if (!spin_trylock_irqsave(&efi_runtime_lock, flags))
+ return EFI_NOT_READY;
+
+ phys_name = virt_to_phys_or_null_size(name, efi_name_size(name));
+ phys_vendor = virt_to_phys_or_null(vendor);
+ phys_data = virt_to_phys_or_null_size(data, data_size);
+
+ /* If data_size is > sizeof(u32) we've got problems */
+ status = efi_thunk(set_variable, phys_name, phys_vendor,
+ attr, data_size, phys_data);
+
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
+
return status;
}
@@ -819,6 +865,9 @@
{
efi_status_t status;
u32 phys_name_size, phys_name, phys_vendor;
+ unsigned long flags;
+
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_name_size = virt_to_phys_or_null(name_size);
phys_vendor = virt_to_phys_or_null(vendor);
@@ -827,6 +876,8 @@
status = efi_thunk(get_next_variable, phys_name_size,
phys_name, phys_vendor);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
+
return status;
}
@@ -835,10 +886,15 @@
{
efi_status_t status;
u32 phys_count;
+ unsigned long flags;
+
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_count = virt_to_phys_or_null(count);
status = efi_thunk(get_next_high_mono_count, phys_count);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
+
return status;
}
@@ -847,10 +903,15 @@
unsigned long data_size, efi_char16_t *data)
{
u32 phys_data;
+ unsigned long flags;
+
+ spin_lock_irqsave(&efi_runtime_lock, flags);
phys_data = virt_to_phys_or_null_size(data, data_size);
efi_thunk(reset_system, reset_type, status, data_size, phys_data);
+
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
}
static efi_status_t
@@ -872,10 +933,13 @@
{
efi_status_t status;
u32 phys_storage, phys_remaining, phys_max;
+ unsigned long flags;
if (efi.runtime_version < EFI_2_00_SYSTEM_TABLE_REVISION)
return EFI_UNSUPPORTED;
+ spin_lock_irqsave(&efi_runtime_lock, flags);
+
phys_storage = virt_to_phys_or_null(storage_space);
phys_remaining = virt_to_phys_or_null(remaining_space);
phys_max = virt_to_phys_or_null(max_variable_size);
@@ -883,6 +947,35 @@
status = efi_thunk(query_variable_info, attr, phys_storage,
phys_remaining, phys_max);
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
+
+ return status;
+}
+
+static efi_status_t
+efi_thunk_query_variable_info_nonblocking(u32 attr, u64 *storage_space,
+ u64 *remaining_space,
+ u64 *max_variable_size)
+{
+ efi_status_t status;
+ u32 phys_storage, phys_remaining, phys_max;
+ unsigned long flags;
+
+ if (efi.runtime_version < EFI_2_00_SYSTEM_TABLE_REVISION)
+ return EFI_UNSUPPORTED;
+
+ if (!spin_trylock_irqsave(&efi_runtime_lock, flags))
+ return EFI_NOT_READY;
+
+ phys_storage = virt_to_phys_or_null(storage_space);
+ phys_remaining = virt_to_phys_or_null(remaining_space);
+ phys_max = virt_to_phys_or_null(max_variable_size);
+
+ status = efi_thunk(query_variable_info, attr, phys_storage,
+ phys_remaining, phys_max);
+
+ spin_unlock_irqrestore(&efi_runtime_lock, flags);
+
return status;
}
@@ -908,9 +1001,11 @@
efi.get_variable = efi_thunk_get_variable;
efi.get_next_variable = efi_thunk_get_next_variable;
efi.set_variable = efi_thunk_set_variable;
+ efi.set_variable_nonblocking = efi_thunk_set_variable_nonblocking;
efi.get_next_high_mono_count = efi_thunk_get_next_high_mono_count;
efi.reset_system = efi_thunk_reset_system;
efi.query_variable_info = efi_thunk_query_variable_info;
+ efi.query_variable_info_nonblocking = efi_thunk_query_variable_info_nonblocking;
efi.update_capsule = efi_thunk_update_capsule;
efi.query_capsule_caps = efi_thunk_query_capsule_caps;
}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b..844d31c 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -105,12 +105,11 @@
*/
void efi_delete_dummy_variable(void)
{
- efi.set_variable((efi_char16_t *)efi_dummy_name,
- &EFI_DUMMY_GUID,
- EFI_VARIABLE_NON_VOLATILE |
- EFI_VARIABLE_BOOTSERVICE_ACCESS |
- EFI_VARIABLE_RUNTIME_ACCESS,
- 0, NULL);
+ efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name,
+ &EFI_DUMMY_GUID,
+ EFI_VARIABLE_NON_VOLATILE |
+ EFI_VARIABLE_BOOTSERVICE_ACCESS |
+ EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL);
}
/*
@@ -249,7 +248,8 @@
int num_entries;
void *new;
- if (efi_mem_desc_lookup(addr, &md)) {
+ if (efi_mem_desc_lookup(addr, &md) ||
+ md.type != EFI_BOOT_SERVICES_DATA) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
return;
}
diff --git a/arch/x86/platform/intel-mid/Makefile b/arch/x86/platform/intel-mid/Makefile
index fa021df..5cf886c 100644
--- a/arch/x86/platform/intel-mid/Makefile
+++ b/arch/x86/platform/intel-mid/Makefile
@@ -1,4 +1,4 @@
-obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o intel_mid_vrtc.o mfld.o mrfld.o pwr.o
+obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o intel_mid_vrtc.o pwr.o
# SFI specific code
ifdef CONFIG_X86_INTEL_MID
diff --git a/arch/x86/platform/intel-mid/intel-mid.c b/arch/x86/platform/intel-mid/intel-mid.c
index 2ebdf31..56f66ea 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -36,8 +36,6 @@
#include <asm/apb_timer.h>
#include <asm/reboot.h>
-#include "intel_mid_weak_decls.h"
-
/*
* the clockevent devices on Moorestown/Medfield can be APBT or LAPIC clock,
* cmdline option x86_intel_mid_timer can be used to override the configuration
@@ -61,10 +59,6 @@
enum intel_mid_timer_options intel_mid_timer_options;
-/* intel_mid_ops to store sub arch ops */
-static struct intel_mid_ops *intel_mid_ops;
-/* getter function for sub arch ops*/
-static void *(*get_intel_mid_ops[])(void) = INTEL_MID_OPS_INIT;
enum intel_mid_cpu_type __intel_mid_cpu_chip;
EXPORT_SYMBOL_GPL(__intel_mid_cpu_chip);
@@ -82,11 +76,6 @@
intel_scu_ipc_simple_command(IPCMSG_COLD_RESET, 0);
}
-static unsigned long __init intel_mid_calibrate_tsc(void)
-{
- return 0;
-}
-
static void __init intel_mid_setup_bp_timer(void)
{
apbt_time_init();
@@ -133,6 +122,7 @@
case 0x3C:
case 0x4A:
__intel_mid_cpu_chip = INTEL_MID_CPU_CHIP_TANGIER;
+ x86_platform.legacy.rtc = 1;
break;
case 0x27:
default:
@@ -140,17 +130,7 @@
break;
}
- if (__intel_mid_cpu_chip < MAX_CPU_OPS(get_intel_mid_ops))
- intel_mid_ops = get_intel_mid_ops[__intel_mid_cpu_chip]();
- else {
- intel_mid_ops = get_intel_mid_ops[INTEL_MID_CPU_CHIP_PENWELL]();
- pr_info("ARCH: Unknown SoC, assuming Penwell!\n");
- }
-
out:
- if (intel_mid_ops->arch_setup)
- intel_mid_ops->arch_setup();
-
/*
* Intel MID platforms are using explicitly defined regulators.
*
@@ -191,7 +171,6 @@
x86_cpuinit.setup_percpu_clockev = apbt_setup_secondary_clock;
- x86_platform.calibrate_tsc = intel_mid_calibrate_tsc;
x86_platform.get_nmi_reason = intel_mid_get_nmi_reason;
x86_init.pci.arch_init = intel_mid_pci_init;
diff --git a/arch/x86/platform/intel-mid/intel_mid_weak_decls.h b/arch/x86/platform/intel-mid/intel_mid_weak_decls.h
deleted file mode 100644
index 3c1c386..0000000
--- a/arch/x86/platform/intel-mid/intel_mid_weak_decls.h
+++ /dev/null
@@ -1,18 +0,0 @@
-/*
- * intel_mid_weak_decls.h: Weak declarations of intel-mid.c
- *
- * (C) Copyright 2013 Intel Corporation
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; version 2
- * of the License.
- */
-
-
-/* For every CPU addition a new get_<cpuname>_ops interface needs
- * to be added.
- */
-extern void *get_penwell_ops(void);
-extern void *get_cloverview_ops(void);
-extern void *get_tangier_ops(void);
diff --git a/arch/x86/platform/intel-mid/mfld.c b/arch/x86/platform/intel-mid/mfld.c
deleted file mode 100644
index e42978d..0000000
--- a/arch/x86/platform/intel-mid/mfld.c
+++ /dev/null
@@ -1,70 +0,0 @@
-/*
- * mfld.c: Intel Medfield platform setup code
- *
- * (C) Copyright 2013 Intel Corporation
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; version 2
- * of the License.
- */
-
-#include <linux/init.h>
-
-#include <asm/apic.h>
-#include <asm/intel-mid.h>
-#include <asm/intel_mid_vrtc.h>
-
-#include "intel_mid_weak_decls.h"
-
-static unsigned long __init mfld_calibrate_tsc(void)
-{
- unsigned long fast_calibrate;
- u32 lo, hi, ratio, fsb;
-
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- pr_debug("IA32 perf status is 0x%x, 0x%0x\n", lo, hi);
- ratio = (hi >> 8) & 0x1f;
- pr_debug("ratio is %d\n", ratio);
- if (!ratio) {
- pr_err("read a zero ratio, should be incorrect!\n");
- pr_err("force tsc ratio to 16 ...\n");
- ratio = 16;
- }
- rdmsr(MSR_FSB_FREQ, lo, hi);
- if ((lo & 0x7) == 0x7)
- fsb = FSB_FREQ_83SKU;
- else
- fsb = FSB_FREQ_100SKU;
- fast_calibrate = ratio * fsb;
- pr_debug("read penwell tsc %lu khz\n", fast_calibrate);
- lapic_timer_frequency = fsb * 1000 / HZ;
-
- /*
- * TSC on Intel Atom SoCs is reliable and of known frequency.
- * See tsc_msr.c for details.
- */
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
- setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
-
- return fast_calibrate;
-}
-
-static void __init penwell_arch_setup(void)
-{
- x86_platform.calibrate_tsc = mfld_calibrate_tsc;
-}
-
-static struct intel_mid_ops penwell_ops = {
- .arch_setup = penwell_arch_setup,
-};
-
-void *get_penwell_ops(void)
-{
- return &penwell_ops;
-}
-
-void *get_cloverview_ops(void)
-{
- return &penwell_ops;
-}
diff --git a/arch/x86/platform/intel-mid/mrfld.c b/arch/x86/platform/intel-mid/mrfld.c
deleted file mode 100644
index ae7bdeb..0000000
--- a/arch/x86/platform/intel-mid/mrfld.c
+++ /dev/null
@@ -1,105 +0,0 @@
-/*
- * Intel Merrifield platform specific setup code
- *
- * (C) Copyright 2013 Intel Corporation
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; version 2
- * of the License.
- */
-
-#include <linux/init.h>
-
-#include <asm/apic.h>
-#include <asm/intel-mid.h>
-
-#include "intel_mid_weak_decls.h"
-
-static unsigned long __init tangier_calibrate_tsc(void)
-{
- unsigned long fast_calibrate;
- u32 lo, hi, ratio, fsb, bus_freq;
-
- /* *********************** */
- /* Compute TSC:Ratio * FSB */
- /* *********************** */
-
- /* Compute Ratio */
- rdmsr(MSR_PLATFORM_INFO, lo, hi);
- pr_debug("IA32 PLATFORM_INFO is 0x%x : %x\n", hi, lo);
-
- ratio = (lo >> 8) & 0xFF;
- pr_debug("ratio is %d\n", ratio);
- if (!ratio) {
- pr_err("Read a zero ratio, force tsc ratio to 4 ...\n");
- ratio = 4;
- }
-
- /* Compute FSB */
- rdmsr(MSR_FSB_FREQ, lo, hi);
- pr_debug("Actual FSB frequency detected by SOC 0x%x : %x\n",
- hi, lo);
-
- bus_freq = lo & 0x7;
- pr_debug("bus_freq = 0x%x\n", bus_freq);
-
- if (bus_freq == 0)
- fsb = FSB_FREQ_100SKU;
- else if (bus_freq == 1)
- fsb = FSB_FREQ_100SKU;
- else if (bus_freq == 2)
- fsb = FSB_FREQ_133SKU;
- else if (bus_freq == 3)
- fsb = FSB_FREQ_167SKU;
- else if (bus_freq == 4)
- fsb = FSB_FREQ_83SKU;
- else if (bus_freq == 5)
- fsb = FSB_FREQ_400SKU;
- else if (bus_freq == 6)
- fsb = FSB_FREQ_267SKU;
- else if (bus_freq == 7)
- fsb = FSB_FREQ_333SKU;
- else {
- BUG();
- pr_err("Invalid bus_freq! Setting to minimal value!\n");
- fsb = FSB_FREQ_100SKU;
- }
-
- /* TSC = FSB Freq * Resolved HFM Ratio */
- fast_calibrate = ratio * fsb;
- pr_debug("calculate tangier tsc %lu KHz\n", fast_calibrate);
-
- /* ************************************ */
- /* Calculate Local APIC Timer Frequency */
- /* ************************************ */
- lapic_timer_frequency = (fsb * 1000) / HZ;
-
- pr_debug("Setting lapic_timer_frequency = %d\n",
- lapic_timer_frequency);
-
- /*
- * TSC on Intel Atom SoCs is reliable and of known frequency.
- * See tsc_msr.c for details.
- */
- setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
- setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
-
- return fast_calibrate;
-}
-
-static void __init tangier_arch_setup(void)
-{
- x86_platform.calibrate_tsc = tangier_calibrate_tsc;
- x86_platform.legacy.rtc = 1;
-}
-
-/* tangier arch ops */
-static struct intel_mid_ops tangier_ops = {
- .arch_setup = tangier_arch_setup,
-};
-
-void *get_tangier_ops(void)
-{
- return &tangier_ops;
-}
diff --git a/arch/x86/platform/olpc/olpc.c b/arch/x86/platform/olpc/olpc.c
index 7c3077e..f0e920f 100644
--- a/arch/x86/platform/olpc/olpc.c
+++ b/arch/x86/platform/olpc/olpc.c
@@ -311,10 +311,8 @@
return PTR_ERR(pdev);
pdev = platform_device_register_simple("olpc-xo1", -1, NULL, 0);
- if (IS_ERR(pdev))
- return PTR_ERR(pdev);
- return 0;
+ return PTR_ERR_OR_ZERO(pdev);
}
static int olpc_xo1_ec_probe(struct platform_device *pdev)
diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c
index ca446da..e26dfad 100644
--- a/arch/x86/platform/uv/tlb_uv.c
+++ b/arch/x86/platform/uv/tlb_uv.c
@@ -1607,8 +1607,6 @@
*tunables[cnt].tunp = val;
continue;
}
- if (q == p)
- break;
}
return 0;
}
diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index ce8da3a..fd369a6 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -137,7 +137,7 @@
/* Saved in save_processor_state. */
lgdt saved_context_gdt_desc(%rax)
- xorq %rax, %rax
+ xorl %eax, %eax
/* tell the hibernation core that we've just restored the memory */
movq %rax, in_suspend(%rip)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 220e978..3a6c8eb 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -67,6 +67,7 @@
"__tracedata_(start|end)|"
"__(start|stop)_notes|"
"__end_rodata|"
+ "__end_rodata_aligned|"
"__initramfs_start|"
"(jiffies|jiffies_64)|"
#if ELF_BITS == 64
diff --git a/arch/x86/um/vdso/.gitignore b/arch/x86/um/vdso/.gitignore
index 9cac6d0..f8b69d8 100644
--- a/arch/x86/um/vdso/.gitignore
+++ b/arch/x86/um/vdso/.gitignore
@@ -1,2 +1 @@
-vdso-syms.lds
vdso.lds
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index b2d6967..822ccdb 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -53,22 +53,6 @@
CFLAGS_REMOVE_vdso-note.o = -pg -fprofile-arcs -ftest-coverage
CFLAGS_REMOVE_um_vdso.o = -pg -fprofile-arcs -ftest-coverage
-targets += vdso-syms.lds
-extra-$(VDSO64-y) += vdso-syms.lds
-
-#
-# Match symbols in the DSO that look like VDSO*; produce a file of constants.
-#
-sed-vdsosym := -e 's/^00*/0/' \
- -e 's/^\([0-9a-fA-F]*\) . \(VDSO[a-zA-Z0-9_]*\)$$/\2 = 0x\1;/p'
-quiet_cmd_vdsosym = VDSOSYM $@
-define cmd_vdsosym
- $(NM) $< | LC_ALL=C sed -n $(sed-vdsosym) | LC_ALL=C sort > $@
-endef
-
-$(obj)/%-syms.lds: $(obj)/%.so.dbg FORCE
- $(call if_changed,vdsosym)
-
#
# The DSO images are built using a special linker script.
#
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 439a94b..105a57d 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -119,6 +119,27 @@
version >> 16, version & 0xffff, extra.extraversion,
xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
}
+
+static void __init xen_pv_init_platform(void)
+{
+ set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
+ HYPERVISOR_shared_info = (void *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+
+ /* xen clock uses per-cpu vcpu_info, need to init it for boot cpu */
+ xen_vcpu_info_reset(0);
+
+ /* pvclock is in shared info area */
+ xen_init_time_ops();
+}
+
+static void __init xen_pv_guest_late_init(void)
+{
+#ifndef CONFIG_SMP
+ /* Setup shared vcpu info for non-smp configurations */
+ xen_setup_vcpu_info_placement();
+#endif
+}
+
/* Check if running on Xen version (major, minor) or later */
bool
xen_running_on_version_or_later(unsigned int major, unsigned int minor)
@@ -947,34 +968,8 @@
xen_write_msr_safe(msr, low, high);
}
-void xen_setup_shared_info(void)
-{
- set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
-
- HYPERVISOR_shared_info =
- (struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
-
- xen_setup_mfn_list_list();
-
- if (system_state == SYSTEM_BOOTING) {
-#ifndef CONFIG_SMP
- /*
- * In UP this is as good a place as any to set up shared info.
- * Limit this to boot only, at restore vcpu setup is done via
- * xen_vcpu_restore().
- */
- xen_setup_vcpu_info_placement();
-#endif
- /*
- * Now that shared info is set up we can start using routines
- * that point to pvclock area.
- */
- xen_init_time_ops();
- }
-}
-
/* This is called once we have the cpu_possible_mask */
-void __ref xen_setup_vcpu_info_placement(void)
+void __init xen_setup_vcpu_info_placement(void)
{
int cpu;
@@ -1228,6 +1223,8 @@
x86_init.irqs.intr_mode_init = x86_init_noop;
x86_init.oem.arch_setup = xen_arch_setup;
x86_init.oem.banner = xen_banner;
+ x86_init.hyper.init_platform = xen_pv_init_platform;
+ x86_init.hyper.guest_late_init = xen_pv_guest_late_init;
/*
* Set up some pagetable state before starting to set any ptes.
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 2c30cab..52206ad 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1230,8 +1230,7 @@
* We roundup to the PMD, which means that if anybody at this stage is
* using the __ka address of xen_start_info or
* xen_start_info->shared_info they are in going to crash. Fortunatly
- * we have already revectored in xen_setup_kernel_pagetable and in
- * xen_setup_shared_info.
+ * we have already revectored in xen_setup_kernel_pagetable.
*/
size = roundup(size, PMD_SIZE);
@@ -1292,8 +1291,7 @@
/* Remap memory freed due to conflicts with E820 map */
xen_remap_memory();
-
- xen_setup_shared_info();
+ xen_setup_mfn_list_list();
}
static void xen_write_cr2(unsigned long cr2)
{
diff --git a/arch/x86/xen/suspend_pv.c b/arch/x86/xen/suspend_pv.c
index a2e0f11..8303b58 100644
--- a/arch/x86/xen/suspend_pv.c
+++ b/arch/x86/xen/suspend_pv.c
@@ -27,8 +27,9 @@
void xen_pv_post_suspend(int suspend_cancelled)
{
xen_build_mfn_list_list();
-
- xen_setup_shared_info();
+ set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
+ HYPERVISOR_shared_info = (void *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+ xen_setup_mfn_list_list();
if (suspend_cancelled) {
xen_start_info->store_mfn =
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index e0f1bcf..c84f1e0 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -31,6 +31,8 @@
/* Xen may fire a timer up to this many ns early */
#define TIMER_SLOP 100000
+static u64 xen_sched_clock_offset __read_mostly;
+
/* Get the TSC speed from Xen */
static unsigned long xen_tsc_khz(void)
{
@@ -40,7 +42,7 @@
return pvclock_tsc_khz(info);
}
-u64 xen_clocksource_read(void)
+static u64 xen_clocksource_read(void)
{
struct pvclock_vcpu_time_info *src;
u64 ret;
@@ -57,6 +59,11 @@
return xen_clocksource_read();
}
+static u64 xen_sched_clock(void)
+{
+ return xen_clocksource_read() - xen_sched_clock_offset;
+}
+
static void xen_read_wallclock(struct timespec64 *ts)
{
struct shared_info *s = HYPERVISOR_shared_info;
@@ -367,7 +374,7 @@
}
static const struct pv_time_ops xen_time_ops __initconst = {
- .sched_clock = xen_clocksource_read,
+ .sched_clock = xen_sched_clock,
.steal_clock = xen_steal_clock,
};
@@ -503,8 +510,9 @@
pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier);
}
-void __ref xen_init_time_ops(void)
+void __init xen_init_time_ops(void)
{
+ xen_sched_clock_offset = xen_clocksource_read();
pv_time_ops = xen_time_ops;
x86_init.timers.timer_init = xen_time_init;
@@ -542,11 +550,11 @@
return;
if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
- printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
- "disable pv timer\n");
+ pr_info("Xen doesn't support pvclock on HVM, disable pv timer");
return;
}
+ xen_sched_clock_offset = xen_clocksource_read();
pv_time_ops = xen_time_ops;
x86_init.timers.setup_percpu_clockev = xen_time_init;
x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 3b34745..e786845 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -31,7 +31,6 @@
extern struct shared_info *HYPERVISOR_shared_info;
void xen_setup_mfn_list_list(void);
-void xen_setup_shared_info(void);
void xen_build_mfn_list_list(void);
void xen_setup_machphys_mapping(void);
void xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
@@ -68,12 +67,11 @@
void xen_setup_timer(int cpu);
void xen_setup_runstate_info(int cpu);
void xen_teardown_timer(int cpu);
-u64 xen_clocksource_read(void);
void xen_setup_cpu_clockevents(void);
void xen_save_time_memory_area(void);
void xen_restore_time_memory_area(void);
-void __ref xen_init_time_ops(void);
-void __init xen_hvm_init_time_ops(void);
+void xen_init_time_ops(void);
+void xen_hvm_init_time_ops(void);
irqreturn_t xen_debug_interrupt(int irq, void *dev_id);
diff --git a/arch/xtensa/include/asm/atomic.h b/arch/xtensa/include/asm/atomic.h
index e7a23f2..7de0149 100644
--- a/arch/xtensa/include/asm/atomic.h
+++ b/arch/xtensa/include/asm/atomic.h
@@ -197,107 +197,9 @@
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
-/**
- * atomic_sub_and_test - subtract value from variable and test result
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_sub_and_test(i,v) (atomic_sub_return((i),(v)) == 0)
-
-/**
- * atomic_inc - increment atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1.
- */
-#define atomic_inc(v) atomic_add(1,(v))
-
-/**
- * atomic_inc - increment atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1.
- */
-#define atomic_inc_return(v) atomic_add_return(1,(v))
-
-/**
- * atomic_dec - decrement atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1.
- */
-#define atomic_dec(v) atomic_sub(1,(v))
-
-/**
- * atomic_dec_return - decrement atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1.
- */
-#define atomic_dec_return(v) atomic_sub_return(1,(v))
-
-/**
- * atomic_dec_and_test - decrement and test
- * @v: pointer of type atomic_t
- *
- * Atomically decrements @v by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-#define atomic_dec_and_test(v) (atomic_sub_return(1,(v)) == 0)
-
-/**
- * atomic_inc_and_test - increment and test
- * @v: pointer of type atomic_t
- *
- * Atomically increments @v by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-#define atomic_inc_and_test(v) (atomic_add_return(1,(v)) == 0)
-
-/**
- * atomic_add_negative - add and test if negative
- * @v: pointer of type atomic_t
- * @i: integer value to add
- *
- * Atomically adds @i to @v and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
-#define atomic_add_negative(i,v) (atomic_add_return((i),(v)) < 0)
-
#define atomic_cmpxchg(v, o, n) ((int)cmpxchg(&((v)->counter), (o), (n)))
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
-/**
- * __atomic_add_unless - add unless the number is a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as it was not @u.
- * Returns the old value of @v.
- */
-static __inline__ int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
-
#endif /* __KERNEL__ */
#endif /* _XTENSA_ATOMIC_H */
diff --git a/arch/xtensa/include/asm/hw_breakpoint.h b/arch/xtensa/include/asm/hw_breakpoint.h
index dbe3053b..9f119c1 100644
--- a/arch/xtensa/include/asm/hw_breakpoint.h
+++ b/arch/xtensa/include/asm/hw_breakpoint.h
@@ -30,13 +30,16 @@
u16 type;
};
+struct perf_event_attr;
struct perf_event;
struct pt_regs;
struct task_struct;
int hw_breakpoint_slots(int type);
-int arch_check_bp_in_kernelspace(struct perf_event *bp);
-int arch_validate_hwbkpt_settings(struct perf_event *bp);
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw);
int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
diff --git a/arch/xtensa/kernel/hw_breakpoint.c b/arch/xtensa/kernel/hw_breakpoint.c
index b35656a..c2e387c 100644
--- a/arch/xtensa/kernel/hw_breakpoint.c
+++ b/arch/xtensa/kernel/hw_breakpoint.c
@@ -33,14 +33,13 @@
}
}
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
{
unsigned int len;
unsigned long va;
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
- va = info->address;
- len = bp->attr.bp_len;
+ va = hw->address;
+ len = hw->len;
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
}
@@ -48,50 +47,41 @@
/*
* Construct an arch_hw_breakpoint from a perf_event.
*/
-static int arch_build_bp_info(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
- switch (bp->attr.bp_type) {
+ switch (attr->bp_type) {
case HW_BREAKPOINT_X:
- info->type = XTENSA_BREAKPOINT_EXECUTE;
+ hw->type = XTENSA_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
- info->type = XTENSA_BREAKPOINT_LOAD;
+ hw->type = XTENSA_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
- info->type = XTENSA_BREAKPOINT_STORE;
+ hw->type = XTENSA_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
- info->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE;
+ hw->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
/* Len */
- info->len = bp->attr.bp_len;
- if (info->len < 1 || info->len > 64 || !is_power_of_2(info->len))
+ hw->len = attr->bp_len;
+ if (hw->len < 1 || hw->len > 64 || !is_power_of_2(hw->len))
return -EINVAL;
/* Address */
- info->address = bp->attr.bp_addr;
- if (info->address & (info->len - 1))
+ hw->address = attr->bp_addr;
+ if (hw->address & (hw->len - 1))
return -EINVAL;
return 0;
}
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
-{
- int ret;
-
- /* Build the arch_hw_breakpoint. */
- ret = arch_build_bp_info(bp);
- return ret;
-}
-
int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data)
{
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index fa0729c..d81c653 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -61,7 +61,7 @@
{
unsigned int counter;
- counter = (unsigned int)__atomic_add_unless(v, 1, 0);
+ counter = (unsigned int)atomic_fetch_add_unless(v, 1, 0);
if (counter <= (unsigned int)INT_MAX)
return (int)counter;
diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index 7513411..9ab3db8 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -35,9 +35,6 @@
struct clk *clk;
int rc;
- if (index < 0)
- return ERR_PTR(-EINVAL);
-
rc = of_parse_phandle_with_args(np, "clocks", "#clock-cells", index,
&clkspec);
if (rc)
@@ -199,7 +196,7 @@
const char *dev_id = dev ? dev_name(dev) : NULL;
struct clk *clk;
- if (dev) {
+ if (dev && dev->of_node) {
clk = __of_clk_get_by_name(dev->of_node, dev_id, con_id);
if (!IS_ERR(clk) || PTR_ERR(clk) == -EPROBE_DEFER)
return clk;
diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile
index 00caf37..c070cc7 100644
--- a/drivers/clocksource/Makefile
+++ b/drivers/clocksource/Makefile
@@ -49,7 +49,7 @@
obj-$(CONFIG_FSL_FTM_TIMER) += fsl_ftm_timer.o
obj-$(CONFIG_VF_PIT_TIMER) += vf_pit_timer.o
obj-$(CONFIG_CLKSRC_QCOM) += qcom-timer.o
-obj-$(CONFIG_MTK_TIMER) += mtk_timer.o
+obj-$(CONFIG_MTK_TIMER) += timer-mediatek.o
obj-$(CONFIG_CLKSRC_PISTACHIO) += time-pistachio.o
obj-$(CONFIG_CLKSRC_TI_32K) += timer-ti-32k.o
obj-$(CONFIG_CLKSRC_NPS) += timer-nps.o
diff --git a/drivers/clocksource/mtk_timer.c b/drivers/clocksource/mtk_timer.c
deleted file mode 100644
index f9b724f..0000000
--- a/drivers/clocksource/mtk_timer.c
+++ /dev/null
@@ -1,268 +0,0 @@
-/*
- * Mediatek SoCs General-Purpose Timer handling.
- *
- * Copyright (C) 2014 Matthias Brugger
- *
- * Matthias Brugger <matthias.bgg@gmail.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/clk.h>
-#include <linux/clockchips.h>
-#include <linux/interrupt.h>
-#include <linux/irq.h>
-#include <linux/irqreturn.h>
-#include <linux/of.h>
-#include <linux/of_address.h>
-#include <linux/of_irq.h>
-#include <linux/sched_clock.h>
-#include <linux/slab.h>
-
-#define GPT_IRQ_EN_REG 0x00
-#define GPT_IRQ_ENABLE(val) BIT((val) - 1)
-#define GPT_IRQ_ACK_REG 0x08
-#define GPT_IRQ_ACK(val) BIT((val) - 1)
-
-#define TIMER_CTRL_REG(val) (0x10 * (val))
-#define TIMER_CTRL_OP(val) (((val) & 0x3) << 4)
-#define TIMER_CTRL_OP_ONESHOT (0)
-#define TIMER_CTRL_OP_REPEAT (1)
-#define TIMER_CTRL_OP_FREERUN (3)
-#define TIMER_CTRL_CLEAR (2)
-#define TIMER_CTRL_ENABLE (1)
-#define TIMER_CTRL_DISABLE (0)
-
-#define TIMER_CLK_REG(val) (0x04 + (0x10 * (val)))
-#define TIMER_CLK_SRC(val) (((val) & 0x1) << 4)
-#define TIMER_CLK_SRC_SYS13M (0)
-#define TIMER_CLK_SRC_RTC32K (1)
-#define TIMER_CLK_DIV1 (0x0)
-#define TIMER_CLK_DIV2 (0x1)
-
-#define TIMER_CNT_REG(val) (0x08 + (0x10 * (val)))
-#define TIMER_CMP_REG(val) (0x0C + (0x10 * (val)))
-
-#define GPT_CLK_EVT 1
-#define GPT_CLK_SRC 2
-
-struct mtk_clock_event_device {
- void __iomem *gpt_base;
- u32 ticks_per_jiffy;
- struct clock_event_device dev;
-};
-
-static void __iomem *gpt_sched_reg __read_mostly;
-
-static u64 notrace mtk_read_sched_clock(void)
-{
- return readl_relaxed(gpt_sched_reg);
-}
-
-static inline struct mtk_clock_event_device *to_mtk_clk(
- struct clock_event_device *c)
-{
- return container_of(c, struct mtk_clock_event_device, dev);
-}
-
-static void mtk_clkevt_time_stop(struct mtk_clock_event_device *evt, u8 timer)
-{
- u32 val;
-
- val = readl(evt->gpt_base + TIMER_CTRL_REG(timer));
- writel(val & ~TIMER_CTRL_ENABLE, evt->gpt_base +
- TIMER_CTRL_REG(timer));
-}
-
-static void mtk_clkevt_time_setup(struct mtk_clock_event_device *evt,
- unsigned long delay, u8 timer)
-{
- writel(delay, evt->gpt_base + TIMER_CMP_REG(timer));
-}
-
-static void mtk_clkevt_time_start(struct mtk_clock_event_device *evt,
- bool periodic, u8 timer)
-{
- u32 val;
-
- /* Acknowledge interrupt */
- writel(GPT_IRQ_ACK(timer), evt->gpt_base + GPT_IRQ_ACK_REG);
-
- val = readl(evt->gpt_base + TIMER_CTRL_REG(timer));
-
- /* Clear 2 bit timer operation mode field */
- val &= ~TIMER_CTRL_OP(0x3);
-
- if (periodic)
- val |= TIMER_CTRL_OP(TIMER_CTRL_OP_REPEAT);
- else
- val |= TIMER_CTRL_OP(TIMER_CTRL_OP_ONESHOT);
-
- writel(val | TIMER_CTRL_ENABLE | TIMER_CTRL_CLEAR,
- evt->gpt_base + TIMER_CTRL_REG(timer));
-}
-
-static int mtk_clkevt_shutdown(struct clock_event_device *clk)
-{
- mtk_clkevt_time_stop(to_mtk_clk(clk), GPT_CLK_EVT);
- return 0;
-}
-
-static int mtk_clkevt_set_periodic(struct clock_event_device *clk)
-{
- struct mtk_clock_event_device *evt = to_mtk_clk(clk);
-
- mtk_clkevt_time_stop(evt, GPT_CLK_EVT);
- mtk_clkevt_time_setup(evt, evt->ticks_per_jiffy, GPT_CLK_EVT);
- mtk_clkevt_time_start(evt, true, GPT_CLK_EVT);
- return 0;
-}
-
-static int mtk_clkevt_next_event(unsigned long event,
- struct clock_event_device *clk)
-{
- struct mtk_clock_event_device *evt = to_mtk_clk(clk);
-
- mtk_clkevt_time_stop(evt, GPT_CLK_EVT);
- mtk_clkevt_time_setup(evt, event, GPT_CLK_EVT);
- mtk_clkevt_time_start(evt, false, GPT_CLK_EVT);
-
- return 0;
-}
-
-static irqreturn_t mtk_timer_interrupt(int irq, void *dev_id)
-{
- struct mtk_clock_event_device *evt = dev_id;
-
- /* Acknowledge timer0 irq */
- writel(GPT_IRQ_ACK(GPT_CLK_EVT), evt->gpt_base + GPT_IRQ_ACK_REG);
- evt->dev.event_handler(&evt->dev);
-
- return IRQ_HANDLED;
-}
-
-static void
-__init mtk_timer_setup(struct mtk_clock_event_device *evt, u8 timer, u8 option)
-{
- writel(TIMER_CTRL_CLEAR | TIMER_CTRL_DISABLE,
- evt->gpt_base + TIMER_CTRL_REG(timer));
-
- writel(TIMER_CLK_SRC(TIMER_CLK_SRC_SYS13M) | TIMER_CLK_DIV1,
- evt->gpt_base + TIMER_CLK_REG(timer));
-
- writel(0x0, evt->gpt_base + TIMER_CMP_REG(timer));
-
- writel(TIMER_CTRL_OP(option) | TIMER_CTRL_ENABLE,
- evt->gpt_base + TIMER_CTRL_REG(timer));
-}
-
-static void mtk_timer_enable_irq(struct mtk_clock_event_device *evt, u8 timer)
-{
- u32 val;
-
- /* Disable all interrupts */
- writel(0x0, evt->gpt_base + GPT_IRQ_EN_REG);
-
- /* Acknowledge all spurious pending interrupts */
- writel(0x3f, evt->gpt_base + GPT_IRQ_ACK_REG);
-
- val = readl(evt->gpt_base + GPT_IRQ_EN_REG);
- writel(val | GPT_IRQ_ENABLE(timer),
- evt->gpt_base + GPT_IRQ_EN_REG);
-}
-
-static int __init mtk_timer_init(struct device_node *node)
-{
- struct mtk_clock_event_device *evt;
- struct resource res;
- unsigned long rate = 0;
- struct clk *clk;
-
- evt = kzalloc(sizeof(*evt), GFP_KERNEL);
- if (!evt)
- return -ENOMEM;
-
- evt->dev.name = "mtk_tick";
- evt->dev.rating = 300;
- evt->dev.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT;
- evt->dev.set_state_shutdown = mtk_clkevt_shutdown;
- evt->dev.set_state_periodic = mtk_clkevt_set_periodic;
- evt->dev.set_state_oneshot = mtk_clkevt_shutdown;
- evt->dev.tick_resume = mtk_clkevt_shutdown;
- evt->dev.set_next_event = mtk_clkevt_next_event;
- evt->dev.cpumask = cpu_possible_mask;
-
- evt->gpt_base = of_io_request_and_map(node, 0, "mtk-timer");
- if (IS_ERR(evt->gpt_base)) {
- pr_err("Can't get resource\n");
- goto err_kzalloc;
- }
-
- evt->dev.irq = irq_of_parse_and_map(node, 0);
- if (evt->dev.irq <= 0) {
- pr_err("Can't parse IRQ\n");
- goto err_mem;
- }
-
- clk = of_clk_get(node, 0);
- if (IS_ERR(clk)) {
- pr_err("Can't get timer clock\n");
- goto err_irq;
- }
-
- if (clk_prepare_enable(clk)) {
- pr_err("Can't prepare clock\n");
- goto err_clk_put;
- }
- rate = clk_get_rate(clk);
-
- if (request_irq(evt->dev.irq, mtk_timer_interrupt,
- IRQF_TIMER | IRQF_IRQPOLL, "mtk_timer", evt)) {
- pr_err("failed to setup irq %d\n", evt->dev.irq);
- goto err_clk_disable;
- }
-
- evt->ticks_per_jiffy = DIV_ROUND_UP(rate, HZ);
-
- /* Configure clock source */
- mtk_timer_setup(evt, GPT_CLK_SRC, TIMER_CTRL_OP_FREERUN);
- clocksource_mmio_init(evt->gpt_base + TIMER_CNT_REG(GPT_CLK_SRC),
- node->name, rate, 300, 32, clocksource_mmio_readl_up);
- gpt_sched_reg = evt->gpt_base + TIMER_CNT_REG(GPT_CLK_SRC);
- sched_clock_register(mtk_read_sched_clock, 32, rate);
-
- /* Configure clock event */
- mtk_timer_setup(evt, GPT_CLK_EVT, TIMER_CTRL_OP_REPEAT);
- clockevents_config_and_register(&evt->dev, rate, 0x3,
- 0xffffffff);
-
- mtk_timer_enable_irq(evt, GPT_CLK_EVT);
-
- return 0;
-
-err_clk_disable:
- clk_disable_unprepare(clk);
-err_clk_put:
- clk_put(clk);
-err_irq:
- irq_dispose_mapping(evt->dev.irq);
-err_mem:
- iounmap(evt->gpt_base);
- of_address_to_resource(node, 0, &res);
- release_mem_region(res.start, resource_size(&res));
-err_kzalloc:
- kfree(evt);
-
- return -EINVAL;
-}
-TIMER_OF_DECLARE(mtk_mt6577, "mediatek,mt6577-timer", mtk_timer_init);
diff --git a/drivers/clocksource/tegra20_timer.c b/drivers/clocksource/tegra20_timer.c
index c337a81..aa624885 100644
--- a/drivers/clocksource/tegra20_timer.c
+++ b/drivers/clocksource/tegra20_timer.c
@@ -230,7 +230,7 @@
return ret;
}
- tegra_clockevent.cpumask = cpu_all_mask;
+ tegra_clockevent.cpumask = cpu_possible_mask;
tegra_clockevent.irq = tegra_timer_irq.irq;
clockevents_config_and_register(&tegra_clockevent, 1000000,
0x1, 0x1fffffff);
@@ -259,6 +259,6 @@
else
clk_prepare_enable(clk);
- return register_persistent_clock(NULL, tegra_read_persistent_clock64);
+ return register_persistent_clock(tegra_read_persistent_clock64);
}
TIMER_OF_DECLARE(tegra20_rtc, "nvidia,tegra20-rtc", tegra20_init_rtc);
diff --git a/drivers/clocksource/timer-atcpit100.c b/drivers/clocksource/timer-atcpit100.c
index 5e23d7b..b4bd2f5 100644
--- a/drivers/clocksource/timer-atcpit100.c
+++ b/drivers/clocksource/timer-atcpit100.c
@@ -185,7 +185,7 @@
.set_state_oneshot = atcpit100_clkevt_set_oneshot,
.tick_resume = atcpit100_clkevt_shutdown,
.set_next_event = atcpit100_clkevt_next_event,
- .cpumask = cpu_all_mask,
+ .cpumask = cpu_possible_mask,
},
.of_irq = {
diff --git a/drivers/clocksource/timer-keystone.c b/drivers/clocksource/timer-keystone.c
index 0eee032..f5b2eda 100644
--- a/drivers/clocksource/timer-keystone.c
+++ b/drivers/clocksource/timer-keystone.c
@@ -211,7 +211,7 @@
event_dev->set_state_shutdown = keystone_shutdown;
event_dev->set_state_periodic = keystone_set_periodic;
event_dev->set_state_oneshot = keystone_shutdown;
- event_dev->cpumask = cpu_all_mask;
+ event_dev->cpumask = cpu_possible_mask;
event_dev->owner = THIS_MODULE;
event_dev->name = TIMER_NAME;
event_dev->irq = irq;
diff --git a/drivers/clocksource/timer-mediatek.c b/drivers/clocksource/timer-mediatek.c
new file mode 100644
index 0000000..eb10321
--- /dev/null
+++ b/drivers/clocksource/timer-mediatek.c
@@ -0,0 +1,328 @@
+/*
+ * Mediatek SoCs General-Purpose Timer handling.
+ *
+ * Copyright (C) 2014 Matthias Brugger
+ *
+ * Matthias Brugger <matthias.bgg@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/clockchips.h>
+#include <linux/clocksource.h>
+#include <linux/interrupt.h>
+#include <linux/irqreturn.h>
+#include <linux/sched_clock.h>
+#include <linux/slab.h>
+#include "timer-of.h"
+
+#define TIMER_CLK_EVT (1)
+#define TIMER_CLK_SRC (2)
+
+#define TIMER_SYNC_TICKS (3)
+
+/* gpt */
+#define GPT_IRQ_EN_REG 0x00
+#define GPT_IRQ_ENABLE(val) BIT((val) - 1)
+#define GPT_IRQ_ACK_REG 0x08
+#define GPT_IRQ_ACK(val) BIT((val) - 1)
+
+#define GPT_CTRL_REG(val) (0x10 * (val))
+#define GPT_CTRL_OP(val) (((val) & 0x3) << 4)
+#define GPT_CTRL_OP_ONESHOT (0)
+#define GPT_CTRL_OP_REPEAT (1)
+#define GPT_CTRL_OP_FREERUN (3)
+#define GPT_CTRL_CLEAR (2)
+#define GPT_CTRL_ENABLE (1)
+#define GPT_CTRL_DISABLE (0)
+
+#define GPT_CLK_REG(val) (0x04 + (0x10 * (val)))
+#define GPT_CLK_SRC(val) (((val) & 0x1) << 4)
+#define GPT_CLK_SRC_SYS13M (0)
+#define GPT_CLK_SRC_RTC32K (1)
+#define GPT_CLK_DIV1 (0x0)
+#define GPT_CLK_DIV2 (0x1)
+
+#define GPT_CNT_REG(val) (0x08 + (0x10 * (val)))
+#define GPT_CMP_REG(val) (0x0C + (0x10 * (val)))
+
+/* system timer */
+#define SYST_BASE (0x40)
+
+#define SYST_CON (SYST_BASE + 0x0)
+#define SYST_VAL (SYST_BASE + 0x4)
+
+#define SYST_CON_REG(to) (timer_of_base(to) + SYST_CON)
+#define SYST_VAL_REG(to) (timer_of_base(to) + SYST_VAL)
+
+/*
+ * SYST_CON_EN: Clock enable. Shall be set to
+ * - Start timer countdown.
+ * - Allow timeout ticks being updated.
+ * - Allow changing interrupt functions.
+ *
+ * SYST_CON_IRQ_EN: Set to allow interrupt.
+ *
+ * SYST_CON_IRQ_CLR: Set to clear interrupt.
+ */
+#define SYST_CON_EN BIT(0)
+#define SYST_CON_IRQ_EN BIT(1)
+#define SYST_CON_IRQ_CLR BIT(4)
+
+static void __iomem *gpt_sched_reg __read_mostly;
+
+static void mtk_syst_ack_irq(struct timer_of *to)
+{
+ /* Clear and disable interrupt */
+ writel(SYST_CON_IRQ_CLR | SYST_CON_EN, SYST_CON_REG(to));
+}
+
+static irqreturn_t mtk_syst_handler(int irq, void *dev_id)
+{
+ struct clock_event_device *clkevt = dev_id;
+ struct timer_of *to = to_timer_of(clkevt);
+
+ mtk_syst_ack_irq(to);
+ clkevt->event_handler(clkevt);
+
+ return IRQ_HANDLED;
+}
+
+static int mtk_syst_clkevt_next_event(unsigned long ticks,
+ struct clock_event_device *clkevt)
+{
+ struct timer_of *to = to_timer_of(clkevt);
+
+ /* Enable clock to allow timeout tick update later */
+ writel(SYST_CON_EN, SYST_CON_REG(to));
+
+ /*
+ * Write new timeout ticks. Timer shall start countdown
+ * after timeout ticks are updated.
+ */
+ writel(ticks, SYST_VAL_REG(to));
+
+ /* Enable interrupt */
+ writel(SYST_CON_EN | SYST_CON_IRQ_EN, SYST_CON_REG(to));
+
+ return 0;
+}
+
+static int mtk_syst_clkevt_shutdown(struct clock_event_device *clkevt)
+{
+ /* Disable timer */
+ writel(0, SYST_CON_REG(to_timer_of(clkevt)));
+
+ return 0;
+}
+
+static int mtk_syst_clkevt_resume(struct clock_event_device *clkevt)
+{
+ return mtk_syst_clkevt_shutdown(clkevt);
+}
+
+static int mtk_syst_clkevt_oneshot(struct clock_event_device *clkevt)
+{
+ return 0;
+}
+
+static u64 notrace mtk_gpt_read_sched_clock(void)
+{
+ return readl_relaxed(gpt_sched_reg);
+}
+
+static void mtk_gpt_clkevt_time_stop(struct timer_of *to, u8 timer)
+{
+ u32 val;
+
+ val = readl(timer_of_base(to) + GPT_CTRL_REG(timer));
+ writel(val & ~GPT_CTRL_ENABLE, timer_of_base(to) +
+ GPT_CTRL_REG(timer));
+}
+
+static void mtk_gpt_clkevt_time_setup(struct timer_of *to,
+ unsigned long delay, u8 timer)
+{
+ writel(delay, timer_of_base(to) + GPT_CMP_REG(timer));
+}
+
+static void mtk_gpt_clkevt_time_start(struct timer_of *to,
+ bool periodic, u8 timer)
+{
+ u32 val;
+
+ /* Acknowledge interrupt */
+ writel(GPT_IRQ_ACK(timer), timer_of_base(to) + GPT_IRQ_ACK_REG);
+
+ val = readl(timer_of_base(to) + GPT_CTRL_REG(timer));
+
+ /* Clear 2 bit timer operation mode field */
+ val &= ~GPT_CTRL_OP(0x3);
+
+ if (periodic)
+ val |= GPT_CTRL_OP(GPT_CTRL_OP_REPEAT);
+ else
+ val |= GPT_CTRL_OP(GPT_CTRL_OP_ONESHOT);
+
+ writel(val | GPT_CTRL_ENABLE | GPT_CTRL_CLEAR,
+ timer_of_base(to) + GPT_CTRL_REG(timer));
+}
+
+static int mtk_gpt_clkevt_shutdown(struct clock_event_device *clk)
+{
+ mtk_gpt_clkevt_time_stop(to_timer_of(clk), TIMER_CLK_EVT);
+
+ return 0;
+}
+
+static int mtk_gpt_clkevt_set_periodic(struct clock_event_device *clk)
+{
+ struct timer_of *to = to_timer_of(clk);
+
+ mtk_gpt_clkevt_time_stop(to, TIMER_CLK_EVT);
+ mtk_gpt_clkevt_time_setup(to, to->of_clk.period, TIMER_CLK_EVT);
+ mtk_gpt_clkevt_time_start(to, true, TIMER_CLK_EVT);
+
+ return 0;
+}
+
+static int mtk_gpt_clkevt_next_event(unsigned long event,
+ struct clock_event_device *clk)
+{
+ struct timer_of *to = to_timer_of(clk);
+
+ mtk_gpt_clkevt_time_stop(to, TIMER_CLK_EVT);
+ mtk_gpt_clkevt_time_setup(to, event, TIMER_CLK_EVT);
+ mtk_gpt_clkevt_time_start(to, false, TIMER_CLK_EVT);
+
+ return 0;
+}
+
+static irqreturn_t mtk_gpt_interrupt(int irq, void *dev_id)
+{
+ struct clock_event_device *clkevt = (struct clock_event_device *)dev_id;
+ struct timer_of *to = to_timer_of(clkevt);
+
+ /* Acknowledge timer0 irq */
+ writel(GPT_IRQ_ACK(TIMER_CLK_EVT), timer_of_base(to) + GPT_IRQ_ACK_REG);
+ clkevt->event_handler(clkevt);
+
+ return IRQ_HANDLED;
+}
+
+static void
+__init mtk_gpt_setup(struct timer_of *to, u8 timer, u8 option)
+{
+ writel(GPT_CTRL_CLEAR | GPT_CTRL_DISABLE,
+ timer_of_base(to) + GPT_CTRL_REG(timer));
+
+ writel(GPT_CLK_SRC(GPT_CLK_SRC_SYS13M) | GPT_CLK_DIV1,
+ timer_of_base(to) + GPT_CLK_REG(timer));
+
+ writel(0x0, timer_of_base(to) + GPT_CMP_REG(timer));
+
+ writel(GPT_CTRL_OP(option) | GPT_CTRL_ENABLE,
+ timer_of_base(to) + GPT_CTRL_REG(timer));
+}
+
+static void mtk_gpt_enable_irq(struct timer_of *to, u8 timer)
+{
+ u32 val;
+
+ /* Disable all interrupts */
+ writel(0x0, timer_of_base(to) + GPT_IRQ_EN_REG);
+
+ /* Acknowledge all spurious pending interrupts */
+ writel(0x3f, timer_of_base(to) + GPT_IRQ_ACK_REG);
+
+ val = readl(timer_of_base(to) + GPT_IRQ_EN_REG);
+ writel(val | GPT_IRQ_ENABLE(timer),
+ timer_of_base(to) + GPT_IRQ_EN_REG);
+}
+
+static struct timer_of to = {
+ .flags = TIMER_OF_IRQ | TIMER_OF_BASE | TIMER_OF_CLOCK,
+
+ .clkevt = {
+ .name = "mtk-clkevt",
+ .rating = 300,
+ .cpumask = cpu_possible_mask,
+ },
+
+ .of_irq = {
+ .flags = IRQF_TIMER | IRQF_IRQPOLL,
+ },
+};
+
+static int __init mtk_syst_init(struct device_node *node)
+{
+ int ret;
+
+ to.clkevt.features = CLOCK_EVT_FEAT_DYNIRQ | CLOCK_EVT_FEAT_ONESHOT;
+ to.clkevt.set_state_shutdown = mtk_syst_clkevt_shutdown;
+ to.clkevt.set_state_oneshot = mtk_syst_clkevt_oneshot;
+ to.clkevt.tick_resume = mtk_syst_clkevt_resume;
+ to.clkevt.set_next_event = mtk_syst_clkevt_next_event;
+ to.of_irq.handler = mtk_syst_handler;
+
+ ret = timer_of_init(node, &to);
+ if (ret)
+ goto err;
+
+ clockevents_config_and_register(&to.clkevt, timer_of_rate(&to),
+ TIMER_SYNC_TICKS, 0xffffffff);
+
+ return 0;
+err:
+ timer_of_cleanup(&to);
+ return ret;
+}
+
+static int __init mtk_gpt_init(struct device_node *node)
+{
+ int ret;
+
+ to.clkevt.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT;
+ to.clkevt.set_state_shutdown = mtk_gpt_clkevt_shutdown;
+ to.clkevt.set_state_periodic = mtk_gpt_clkevt_set_periodic;
+ to.clkevt.set_state_oneshot = mtk_gpt_clkevt_shutdown;
+ to.clkevt.tick_resume = mtk_gpt_clkevt_shutdown;
+ to.clkevt.set_next_event = mtk_gpt_clkevt_next_event;
+ to.of_irq.handler = mtk_gpt_interrupt;
+
+ ret = timer_of_init(node, &to);
+ if (ret)
+ goto err;
+
+ /* Configure clock source */
+ mtk_gpt_setup(&to, TIMER_CLK_SRC, GPT_CTRL_OP_FREERUN);
+ clocksource_mmio_init(timer_of_base(&to) + GPT_CNT_REG(TIMER_CLK_SRC),
+ node->name, timer_of_rate(&to), 300, 32,
+ clocksource_mmio_readl_up);
+ gpt_sched_reg = timer_of_base(&to) + GPT_CNT_REG(TIMER_CLK_SRC);
+ sched_clock_register(mtk_gpt_read_sched_clock, 32, timer_of_rate(&to));
+
+ /* Configure clock event */
+ mtk_gpt_setup(&to, TIMER_CLK_EVT, GPT_CTRL_OP_REPEAT);
+ clockevents_config_and_register(&to.clkevt, timer_of_rate(&to),
+ TIMER_SYNC_TICKS, 0xffffffff);
+
+ mtk_gpt_enable_irq(&to, TIMER_CLK_EVT);
+
+ return 0;
+err:
+ timer_of_cleanup(&to);
+ return ret;
+}
+TIMER_OF_DECLARE(mtk_mt6577, "mediatek,mt6577-timer", mtk_gpt_init);
+TIMER_OF_DECLARE(mtk_mt6765, "mediatek,mt6765-timer", mtk_syst_init);
diff --git a/drivers/clocksource/timer-sprd.c b/drivers/clocksource/timer-sprd.c
index ef9ebea..430cb99 100644
--- a/drivers/clocksource/timer-sprd.c
+++ b/drivers/clocksource/timer-sprd.c
@@ -156,4 +156,54 @@
return 0;
}
+static struct timer_of suspend_to = {
+ .flags = TIMER_OF_BASE | TIMER_OF_CLOCK,
+};
+
+static u64 sprd_suspend_timer_read(struct clocksource *cs)
+{
+ return ~(u64)readl_relaxed(timer_of_base(&suspend_to) +
+ TIMER_VALUE_SHDW_LO) & cs->mask;
+}
+
+static int sprd_suspend_timer_enable(struct clocksource *cs)
+{
+ sprd_timer_update_counter(timer_of_base(&suspend_to),
+ TIMER_VALUE_LO_MASK);
+ sprd_timer_enable(timer_of_base(&suspend_to), TIMER_CTL_PERIOD_MODE);
+
+ return 0;
+}
+
+static void sprd_suspend_timer_disable(struct clocksource *cs)
+{
+ sprd_timer_disable(timer_of_base(&suspend_to));
+}
+
+static struct clocksource suspend_clocksource = {
+ .name = "sprd_suspend_timer",
+ .rating = 200,
+ .read = sprd_suspend_timer_read,
+ .enable = sprd_suspend_timer_enable,
+ .disable = sprd_suspend_timer_disable,
+ .mask = CLOCKSOURCE_MASK(32),
+ .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_SUSPEND_NONSTOP,
+};
+
+static int __init sprd_suspend_timer_init(struct device_node *np)
+{
+ int ret;
+
+ ret = timer_of_init(np, &suspend_to);
+ if (ret)
+ return ret;
+
+ clocksource_register_hz(&suspend_clocksource,
+ timer_of_rate(&suspend_to));
+
+ return 0;
+}
+
TIMER_OF_DECLARE(sc9860_timer, "sprd,sc9860-timer", sprd_timer_init);
+TIMER_OF_DECLARE(sc9860_persistent_timer, "sprd,sc9860-suspend-timer",
+ sprd_suspend_timer_init);
diff --git a/drivers/clocksource/timer-ti-32k.c b/drivers/clocksource/timer-ti-32k.c
index 880a861..29e2e1a 100644
--- a/drivers/clocksource/timer-ti-32k.c
+++ b/drivers/clocksource/timer-ti-32k.c
@@ -78,8 +78,7 @@
.rating = 250,
.read = ti_32k_read_cycles,
.mask = CLOCKSOURCE_MASK(32),
- .flags = CLOCK_SOURCE_IS_CONTINUOUS |
- CLOCK_SOURCE_SUSPEND_NONSTOP,
+ .flags = CLOCK_SOURCE_IS_CONTINUOUS,
},
};
diff --git a/drivers/clocksource/zevio-timer.c b/drivers/clocksource/zevio-timer.c
index a6a0338..f746893 100644
--- a/drivers/clocksource/zevio-timer.c
+++ b/drivers/clocksource/zevio-timer.c
@@ -162,7 +162,7 @@
timer->clkevt.set_state_oneshot = zevio_timer_set_oneshot;
timer->clkevt.tick_resume = zevio_timer_set_oneshot;
timer->clkevt.rating = 200;
- timer->clkevt.cpumask = cpu_all_mask;
+ timer->clkevt.cpumask = cpu_possible_mask;
timer->clkevt.features = CLOCK_EVT_FEAT_ONESHOT;
timer->clkevt.irq = irqnr;
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 781a4a3..d8e159f 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -87,6 +87,18 @@
config EFI_ARMSTUB
bool
+config EFI_ARMSTUB_DTB_LOADER
+ bool "Enable the DTB loader"
+ depends on EFI_ARMSTUB
+ help
+ Select this config option to add support for the dtb= command
+ line parameter, allowing a device tree blob to be loaded into
+ memory from the EFI System Partition by the stub.
+
+ The device tree is typically provided by the platform or by
+ the bootloader, so this option is mostly for development
+ purposes only.
+
config EFI_BOOTLOADER_CONTROL
tristate "EFI Bootloader Control"
depends on EFI_VARS
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 3bf0dca..a7902fc 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -48,8 +48,21 @@
{
static atomic64_t seq;
- if (!atomic64_read(&seq))
- atomic64_set(&seq, ((u64)get_seconds()) << 32);
+ if (!atomic64_read(&seq)) {
+ time64_t time = ktime_get_real_seconds();
+
+ /*
+ * This code is unlikely to still be needed in year 2106,
+ * but just in case, let's use a few more bits for timestamps
+ * after y2038 to be sure they keep increasing monotonically
+ * for the next few hundred years...
+ */
+ if (time < 0x80000000)
+ atomic64_set(&seq, (ktime_get_real_seconds()) << 32);
+ else
+ atomic64_set(&seq, 0x8000000000000000ull |
+ ktime_get_real_seconds() << 24);
+ }
return atomic64_inc_return(&seq);
}
@@ -459,7 +472,7 @@
else
goto err_section_too_small;
#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
- } else if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_ARM)) {
+ } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
struct cper_sec_proc_arm *arm_err = acpi_hest_get_payload(gdata);
printk("%ssection_type: ARM processor error\n", newpfx);
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f491..2a29dd9 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -82,8 +82,11 @@
.mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
.page_table_lock = __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+ .cpu_bitmap = { [BITS_TO_LONGS(NR_CPUS)] = 0},
};
+struct workqueue_struct *efi_rts_wq;
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -337,6 +340,18 @@
if (!efi_enabled(EFI_BOOT))
return 0;
+ /*
+ * Since we process only one efi_runtime_service() at a time, an
+ * ordered workqueue (which creates only one execution context)
+ * should suffice all our needs.
+ */
+ efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+ if (!efi_rts_wq) {
+ pr_err("Creating efi_rts_wq failed, EFI runtime services disabled.\n");
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ return 0;
+ }
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
@@ -388,7 +403,7 @@
* and if so, populate the supplied memory descriptor with the appropriate
* data.
*/
-int __init efi_mem_desc_lookup(u64 phys_addr, efi_memory_desc_t *out_md)
+int efi_mem_desc_lookup(u64 phys_addr, efi_memory_desc_t *out_md)
{
efi_memory_desc_t *md;
@@ -406,12 +421,6 @@
u64 size;
u64 end;
- if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
- md->type != EFI_BOOT_SERVICES_DATA &&
- md->type != EFI_RUNTIME_SERVICES_DATA) {
- continue;
- }
-
size = md->num_pages << EFI_PAGE_SHIFT;
end = md->phys_addr + size;
if (phys_addr >= md->phys_addr && phys_addr < end) {
diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
index 1ab80e0..5d06bd2 100644
--- a/drivers/firmware/efi/esrt.c
+++ b/drivers/firmware/efi/esrt.c
@@ -250,7 +250,10 @@
return;
rc = efi_mem_desc_lookup(efi.esrt, &md);
- if (rc < 0) {
+ if (rc < 0 ||
+ (!(md.attribute & EFI_MEMORY_RUNTIME) &&
+ md.type != EFI_BOOT_SERVICES_DATA &&
+ md.type != EFI_RUNTIME_SERVICES_DATA)) {
pr_warn("ESRT header is not in the memory map.\n");
return;
}
@@ -326,7 +329,8 @@
end = esrt_data + size;
pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end);
- efi_mem_reserve(esrt_data, esrt_data_size);
+ if (md.type == EFI_BOOT_SERVICES_DATA)
+ efi_mem_reserve(esrt_data, esrt_data_size);
pr_debug("esrt-init: loaded.\n");
}
diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
index 01a9d78..6920033 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -40,31 +40,6 @@
static u64 virtmap_base = EFI_RT_VIRTUAL_BASE;
-efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg,
- void *__image, void **__fh)
-{
- efi_file_io_interface_t *io;
- efi_loaded_image_t *image = __image;
- efi_file_handle_t *fh;
- efi_guid_t fs_proto = EFI_FILE_SYSTEM_GUID;
- efi_status_t status;
- void *handle = (void *)(unsigned long)image->device_handle;
-
- status = sys_table_arg->boottime->handle_protocol(handle,
- &fs_proto, (void **)&io);
- if (status != EFI_SUCCESS) {
- efi_printk(sys_table_arg, "Failed to handle fs_proto\n");
- return status;
- }
-
- status = io->open_volume(io, &fh);
- if (status != EFI_SUCCESS)
- efi_printk(sys_table_arg, "Failed to open volume\n");
-
- *__fh = fh;
- return status;
-}
-
void efi_char16_printk(efi_system_table_t *sys_table_arg,
efi_char16_t *str)
{
@@ -202,9 +177,10 @@
* 'dtb=' unless UEFI Secure Boot is disabled. We assume that secure
* boot is enabled if we can't determine its state.
*/
- if (secure_boot != efi_secureboot_mode_disabled &&
- strstr(cmdline_ptr, "dtb=")) {
- pr_efi(sys_table, "Ignoring DTB from command line.\n");
+ if (!IS_ENABLED(CONFIG_EFI_ARMSTUB_DTB_LOADER) ||
+ secure_boot != efi_secureboot_mode_disabled) {
+ if (strstr(cmdline_ptr, "dtb="))
+ pr_efi(sys_table, "Ignoring DTB from command line.\n");
} else {
status = handle_cmdline_files(sys_table, image, cmdline_ptr,
"dtb=",
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 50a9cab..e94975f 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -413,6 +413,34 @@
return efi_call_proto(efi_file_handle, close, handle);
}
+static efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg,
+ efi_loaded_image_t *image,
+ efi_file_handle_t **__fh)
+{
+ efi_file_io_interface_t *io;
+ efi_file_handle_t *fh;
+ efi_guid_t fs_proto = EFI_FILE_SYSTEM_GUID;
+ efi_status_t status;
+ void *handle = (void *)(unsigned long)efi_table_attr(efi_loaded_image,
+ device_handle,
+ image);
+
+ status = efi_call_early(handle_protocol, handle,
+ &fs_proto, (void **)&io);
+ if (status != EFI_SUCCESS) {
+ efi_printk(sys_table_arg, "Failed to handle fs_proto\n");
+ return status;
+ }
+
+ status = efi_call_proto(efi_file_io_interface, open_volume, io, &fh);
+ if (status != EFI_SUCCESS)
+ efi_printk(sys_table_arg, "Failed to open volume\n");
+ else
+ *__fh = fh;
+
+ return status;
+}
+
/*
* Parse the ASCII string 'cmdline' for EFI options, denoted by the efi=
* option, e.g. efi=nochunk.
@@ -563,8 +591,7 @@
/* Only open the volume once. */
if (!i) {
- status = efi_open_volume(sys_table_arg, image,
- (void **)&fh);
+ status = efi_open_volume(sys_table_arg, image, &fh);
if (status != EFI_SUCCESS)
goto free_files;
}
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index f59564b..32799cf 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -36,9 +36,6 @@
void efi_char16_printk(efi_system_table_t *, efi_char16_t *);
-efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image,
- void **__fh);
-
unsigned long get_dram_base(efi_system_table_t *sys_table_arg);
efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b..aa66cbf 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
/*
* runtime-wrappers.c - Runtime Services function call wrappers
*
+ * Implementation summary:
+ * -----------------------
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
* Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
#include <linux/mutex.h>
#include <linux/semaphore.h>
#include <linux/stringify.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
+
#include <asm/efi.h>
/*
@@ -33,6 +45,76 @@
#define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+ GET_TIME,
+ SET_TIME,
+ GET_WAKEUP_TIME,
+ SET_WAKEUP_TIME,
+ GET_VARIABLE,
+ GET_NEXT_VARIABLE,
+ SET_VARIABLE,
+ QUERY_VARIABLE_INFO,
+ GET_NEXT_HIGH_MONO_COUNT,
+ UPDATE_CAPSULE,
+ QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work: Details of EFI Runtime Service work
+ * @arg<1-5>: EFI Runtime Service function arguments
+ * @status: Status of executing EFI Runtime Service
+ * @efi_rts_id: EFI Runtime Service function identifier
+ * @efi_rts_comp: Struct used for handling completions
+ */
+struct efi_runtime_work {
+ void *arg1;
+ void *arg2;
+ void *arg3;
+ void *arg4;
+ void *arg5;
+ efi_status_t status;
+ struct work_struct work;
+ enum efi_rts_ids efi_rts_id;
+ struct completion efi_rts_comp;
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts: efi_runtime_service() function identifier
+ * @rts_arg<1-5>: efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the caller
+ * thread waits for completion.
+ */
+#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \
+({ \
+ struct efi_runtime_work efi_rts_work; \
+ efi_rts_work.status = EFI_ABORTED; \
+ \
+ init_completion(&efi_rts_work.efi_rts_comp); \
+ INIT_WORK_ONSTACK(&efi_rts_work.work, efi_call_rts); \
+ efi_rts_work.arg1 = _arg1; \
+ efi_rts_work.arg2 = _arg2; \
+ efi_rts_work.arg3 = _arg3; \
+ efi_rts_work.arg4 = _arg4; \
+ efi_rts_work.arg5 = _arg5; \
+ efi_rts_work.efi_rts_id = _rts; \
+ \
+ /* \
+ * queue_work() returns 0 if work was already on queue, \
+ * _ideally_ this should never happen. \
+ */ \
+ if (queue_work(efi_rts_wq, &efi_rts_work.work)) \
+ wait_for_completion(&efi_rts_work.efi_rts_comp); \
+ else \
+ pr_err("Failed to queue work to efi_rts_wq.\n"); \
+ \
+ efi_rts_work.status; \
+})
+
void efi_call_virt_check_flags(unsigned long flags, const char *call)
{
unsigned long cur_flags, mismatch;
@@ -90,13 +172,98 @@
*/
static DEFINE_SEMAPHORE(efi_runtime_lock);
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+ struct efi_runtime_work *efi_rts_work;
+ void *arg1, *arg2, *arg3, *arg4, *arg5;
+ efi_status_t status = EFI_NOT_FOUND;
+
+ efi_rts_work = container_of(work, struct efi_runtime_work, work);
+ arg1 = efi_rts_work->arg1;
+ arg2 = efi_rts_work->arg2;
+ arg3 = efi_rts_work->arg3;
+ arg4 = efi_rts_work->arg4;
+ arg5 = efi_rts_work->arg5;
+
+ switch (efi_rts_work->efi_rts_id) {
+ case GET_TIME:
+ status = efi_call_virt(get_time, (efi_time_t *)arg1,
+ (efi_time_cap_t *)arg2);
+ break;
+ case SET_TIME:
+ status = efi_call_virt(set_time, (efi_time_t *)arg1);
+ break;
+ case GET_WAKEUP_TIME:
+ status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+ (efi_bool_t *)arg2, (efi_time_t *)arg3);
+ break;
+ case SET_WAKEUP_TIME:
+ status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+ (efi_time_t *)arg2);
+ break;
+ case GET_VARIABLE:
+ status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, (u32 *)arg3,
+ (unsigned long *)arg4, (void *)arg5);
+ break;
+ case GET_NEXT_VARIABLE:
+ status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
+ (efi_char16_t *)arg2,
+ (efi_guid_t *)arg3);
+ break;
+ case SET_VARIABLE:
+ status = efi_call_virt(set_variable, (efi_char16_t *)arg1,
+ (efi_guid_t *)arg2, *(u32 *)arg3,
+ *(unsigned long *)arg4, (void *)arg5);
+ break;
+ case QUERY_VARIABLE_INFO:
+ status = efi_call_virt(query_variable_info, *(u32 *)arg1,
+ (u64 *)arg2, (u64 *)arg3, (u64 *)arg4);
+ break;
+ case GET_NEXT_HIGH_MONO_COUNT:
+ status = efi_call_virt(get_next_high_mono_count, (u32 *)arg1);
+ break;
+ case UPDATE_CAPSULE:
+ status = efi_call_virt(update_capsule,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2,
+ *(unsigned long *)arg3);
+ break;
+ case QUERY_CAPSULE_CAPS:
+ status = efi_call_virt(query_capsule_caps,
+ (efi_capsule_header_t **)arg1,
+ *(unsigned long *)arg2, (u64 *)arg3,
+ (int *)arg4);
+ break;
+ default:
+ /*
+ * Ideally, we should never reach here because a caller of this
+ * function should have put the right efi_runtime_service()
+ * function identifier into efi_rts_work->efi_rts_id
+ */
+ pr_err("Requested executing invalid EFI Runtime Service.\n");
+ }
+ efi_rts_work->status = status;
+ complete(&efi_rts_work->efi_rts_comp);
+}
+
static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
{
efi_status_t status;
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_time, tm, tc);
+ status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -107,7 +274,7 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_time, tm);
+ status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -120,7 +287,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
+ status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -131,7 +299,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_wakeup_time, enabled, tm);
+ status = efi_queue_work(SET_WAKEUP_TIME, &enabled, tm, NULL, NULL,
+ NULL);
up(&efi_runtime_lock);
return status;
}
@@ -146,8 +315,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_variable, name, vendor, attr, data_size,
- data);
+ status = efi_queue_work(GET_VARIABLE, name, vendor, attr, data_size,
+ data);
up(&efi_runtime_lock);
return status;
}
@@ -160,7 +329,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_variable, name_size, name, vendor);
+ status = efi_queue_work(GET_NEXT_VARIABLE, name_size, name, vendor,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -175,8 +345,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(set_variable, name, vendor, attr, data_size,
- data);
+ status = efi_queue_work(SET_VARIABLE, name, vendor, &attr, &data_size,
+ data);
up(&efi_runtime_lock);
return status;
}
@@ -210,8 +380,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_variable_info, attr, storage_space,
- remaining_space, max_variable_size);
+ status = efi_queue_work(QUERY_VARIABLE_INFO, &attr, storage_space,
+ remaining_space, max_variable_size, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -242,7 +412,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(get_next_high_mono_count, count);
+ status = efi_queue_work(GET_NEXT_HIGH_MONO_COUNT, count, NULL, NULL,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -272,7 +443,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(update_capsule, capsules, count, sg_list);
+ status = efi_queue_work(UPDATE_CAPSULE, capsules, &count, &sg_list,
+ NULL, NULL);
up(&efi_runtime_lock);
return status;
}
@@ -289,8 +461,8 @@
if (down_interruptible(&efi_runtime_lock))
return EFI_ABORTED;
- status = efi_call_virt(query_capsule_caps, capsules, count, max_size,
- reset_type);
+ status = efi_queue_work(QUERY_CAPSULE_CAPS, capsules, &count,
+ max_size, reset_type, NULL);
up(&efi_runtime_lock);
return status;
}
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index a6e9049..475910f 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -121,7 +121,7 @@
* this lock.
*/
if (!exclusive)
- return __atomic_add_unless(&uobj->usecnt, 1, -1) == -1 ?
+ return atomic_fetch_add_unless(&uobj->usecnt, 1, -1) == -1 ?
-EBUSY : 0;
/* lock is either WRITE or DESTROY - should be exclusive */
diff --git a/drivers/input/keyboard/hilkbd.c b/drivers/input/keyboard/hilkbd.c
index a4e404a..5c7afde 100644
--- a/drivers/input/keyboard/hilkbd.c
+++ b/drivers/input/keyboard/hilkbd.c
@@ -57,8 +57,8 @@
#define HIL_DATA 0x1
#define HIL_CMD 0x3
#define HIL_IRQ 2
- #define hil_readb(p) readb(p)
- #define hil_writeb(v,p) writeb((v),(p))
+ #define hil_readb(p) readb((const volatile void __iomem *)(p))
+ #define hil_writeb(v, p) writeb((v), (volatile void __iomem *)(p))
#else
#error "HIL is not supported on this platform"
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index e9233db..d564d21 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -8,7 +8,7 @@
bool
select IRQ_DOMAIN
select IRQ_DOMAIN_HIERARCHY
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
select GENERIC_IRQ_EFFECTIVE_AFF_MASK
config ARM_GIC_PM
@@ -34,7 +34,7 @@
config ARM_GIC_V3
bool
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
select IRQ_DOMAIN_HIERARCHY
select PARTITION_PERCPU
select GENERIC_IRQ_EFFECTIVE_AFF_MASK
@@ -66,7 +66,7 @@
config ARM_VIC
bool
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
config ARM_VIC_NR
int
@@ -93,14 +93,14 @@
bool
select GENERIC_IRQ_CHIP
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
select SPARSE_IRQ
config ATMEL_AIC5_IRQ
bool
select GENERIC_IRQ_CHIP
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
select SPARSE_IRQ
config I8259
@@ -137,7 +137,7 @@
config FARADAY_FTINTC010
bool
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
select SPARSE_IRQ
config HISILICON_IRQ_MBIGEN
@@ -162,7 +162,7 @@
bool
depends on ARCH_CLPS711X
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
select SPARSE_IRQ
default y
@@ -181,7 +181,7 @@
config ORION_IRQCHIP
bool
select IRQ_DOMAIN
- select MULTI_IRQ_HANDLER
+ select GENERIC_IRQ_MULTI_HANDLER
config PIC32_EVIC
bool
diff --git a/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c b/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c
index 4eca5c7..606efa6 100644
--- a/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c
@@ -45,6 +45,9 @@
*/
info->scratchpad[0].ul = mc_bus_dev->icid;
msi_info = msi_get_domain_info(msi_domain->parent);
+
+ /* Allocate at least 32 MSIs, and always as a power of 2 */
+ nvec = max_t(int, 32, roundup_pow_of_two(nvec));
return msi_info->ops->msi_prepare(msi_domain->parent, dev, nvec, info);
}
diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
index 25a98de..8d6d009 100644
--- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
@@ -66,7 +66,7 @@
{
struct pci_dev *pdev, *alias_dev;
struct msi_domain_info *msi_info;
- int alias_count = 0;
+ int alias_count = 0, minnvec = 1;
if (!dev_is_pci(dev))
return -EINVAL;
@@ -86,8 +86,18 @@
/* ITS specific DeviceID, as the core ITS ignores dev. */
info->scratchpad[0].ul = pci_msi_domain_get_msi_rid(domain, pdev);
- return msi_info->ops->msi_prepare(domain->parent,
- dev, max(nvec, alias_count), info);
+ /*
+ * Always allocate a power of 2, and special case device 0 for
+ * broken systems where the DevID is not wired (and all devices
+ * appear as DevID 0). For that reason, we generously allocate a
+ * minimum of 32 MSIs for DevID 0. If you want more because all
+ * your devices are aliasing to DevID 0, consider fixing your HW.
+ */
+ nvec = max(nvec, alias_count);
+ if (!info->scratchpad[0].ul)
+ minnvec = 32;
+ nvec = max_t(int, minnvec, roundup_pow_of_two(nvec));
+ return msi_info->ops->msi_prepare(domain->parent, dev, nvec, info);
}
static struct msi_domain_ops its_pci_msi_ops = {
diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
index 8881a05..7b8e87b 100644
--- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
@@ -73,6 +73,8 @@
/* ITS specific DeviceID, as the core ITS ignores dev. */
info->scratchpad[0].ul = dev_id;
+ /* Allocate at least 32 MSIs, and always as a power of 2 */
+ nvec = max_t(int, 32, roundup_pow_of_two(nvec));
return msi_info->ops->msi_prepare(domain->parent,
dev, nvec, info);
}
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d7842d3..316a575 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -23,6 +23,8 @@
#include <linux/dma-iommu.h>
#include <linux/interrupt.h>
#include <linux/irqdomain.h>
+#include <linux/list.h>
+#include <linux/list_sort.h>
#include <linux/log2.h>
#include <linux/mm.h>
#include <linux/msi.h>
@@ -160,7 +162,7 @@
} vpe_proxy;
static LIST_HEAD(its_nodes);
-static DEFINE_SPINLOCK(its_lock);
+static DEFINE_RAW_SPINLOCK(its_lock);
static struct rdists *gic_rdists;
static struct irq_domain *its_parent;
@@ -1421,112 +1423,176 @@
.irq_set_vcpu_affinity = its_irq_set_vcpu_affinity,
};
+
/*
* How we allocate LPIs:
*
- * The GIC has id_bits bits for interrupt identifiers. From there, we
- * must subtract 8192 which are reserved for SGIs/PPIs/SPIs. Then, as
- * we allocate LPIs by chunks of 32, we can shift the whole thing by 5
- * bits to the right.
+ * lpi_range_list contains ranges of LPIs that are to available to
+ * allocate from. To allocate LPIs, just pick the first range that
+ * fits the required allocation, and reduce it by the required
+ * amount. Once empty, remove the range from the list.
*
- * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations.
+ * To free a range of LPIs, add a free range to the list, sort it and
+ * merge the result if the new range happens to be adjacent to an
+ * already free block.
+ *
+ * The consequence of the above is that allocation is cost is low, but
+ * freeing is expensive. We assumes that freeing rarely occurs.
*/
-#define IRQS_PER_CHUNK_SHIFT 5
-#define IRQS_PER_CHUNK (1UL << IRQS_PER_CHUNK_SHIFT)
-#define ITS_MAX_LPI_NRBITS 16 /* 64K LPIs */
-static unsigned long *lpi_bitmap;
-static u32 lpi_chunks;
-static DEFINE_SPINLOCK(lpi_lock);
+static DEFINE_MUTEX(lpi_range_lock);
+static LIST_HEAD(lpi_range_list);
-static int its_lpi_to_chunk(int lpi)
+struct lpi_range {
+ struct list_head entry;
+ u32 base_id;
+ u32 span;
+};
+
+static struct lpi_range *mk_lpi_range(u32 base, u32 span)
{
- return (lpi - 8192) >> IRQS_PER_CHUNK_SHIFT;
+ struct lpi_range *range;
+
+ range = kzalloc(sizeof(*range), GFP_KERNEL);
+ if (range) {
+ INIT_LIST_HEAD(&range->entry);
+ range->base_id = base;
+ range->span = span;
+ }
+
+ return range;
}
-static int its_chunk_to_lpi(int chunk)
+static int lpi_range_cmp(void *priv, struct list_head *a, struct list_head *b)
{
- return (chunk << IRQS_PER_CHUNK_SHIFT) + 8192;
+ struct lpi_range *ra, *rb;
+
+ ra = container_of(a, struct lpi_range, entry);
+ rb = container_of(b, struct lpi_range, entry);
+
+ return rb->base_id - ra->base_id;
+}
+
+static void merge_lpi_ranges(void)
+{
+ struct lpi_range *range, *tmp;
+
+ list_for_each_entry_safe(range, tmp, &lpi_range_list, entry) {
+ if (!list_is_last(&range->entry, &lpi_range_list) &&
+ (tmp->base_id == (range->base_id + range->span))) {
+ tmp->base_id = range->base_id;
+ tmp->span += range->span;
+ list_del(&range->entry);
+ kfree(range);
+ }
+ }
+}
+
+static int alloc_lpi_range(u32 nr_lpis, u32 *base)
+{
+ struct lpi_range *range, *tmp;
+ int err = -ENOSPC;
+
+ mutex_lock(&lpi_range_lock);
+
+ list_for_each_entry_safe(range, tmp, &lpi_range_list, entry) {
+ if (range->span >= nr_lpis) {
+ *base = range->base_id;
+ range->base_id += nr_lpis;
+ range->span -= nr_lpis;
+
+ if (range->span == 0) {
+ list_del(&range->entry);
+ kfree(range);
+ }
+
+ err = 0;
+ break;
+ }
+ }
+
+ mutex_unlock(&lpi_range_lock);
+
+ pr_debug("ITS: alloc %u:%u\n", *base, nr_lpis);
+ return err;
+}
+
+static int free_lpi_range(u32 base, u32 nr_lpis)
+{
+ struct lpi_range *new;
+ int err = 0;
+
+ mutex_lock(&lpi_range_lock);
+
+ new = mk_lpi_range(base, nr_lpis);
+ if (!new) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ list_add(&new->entry, &lpi_range_list);
+ list_sort(NULL, &lpi_range_list, lpi_range_cmp);
+ merge_lpi_ranges();
+out:
+ mutex_unlock(&lpi_range_lock);
+ return err;
}
static int __init its_lpi_init(u32 id_bits)
{
- lpi_chunks = its_lpi_to_chunk(1UL << id_bits);
+ u32 lpis = (1UL << id_bits) - 8192;
+ u32 numlpis;
+ int err;
- lpi_bitmap = kcalloc(BITS_TO_LONGS(lpi_chunks), sizeof(long),
- GFP_KERNEL);
- if (!lpi_bitmap) {
- lpi_chunks = 0;
- return -ENOMEM;
+ numlpis = 1UL << GICD_TYPER_NUM_LPIS(gic_rdists->gicd_typer);
+
+ if (numlpis > 2 && !WARN_ON(numlpis > lpis)) {
+ lpis = numlpis;
+ pr_info("ITS: Using hypervisor restricted LPI range [%u]\n",
+ lpis);
}
- pr_info("ITS: Allocated %d chunks for LPIs\n", (int)lpi_chunks);
- return 0;
+ /*
+ * Initializing the allocator is just the same as freeing the
+ * full range of LPIs.
+ */
+ err = free_lpi_range(8192, lpis);
+ pr_debug("ITS: Allocator initialized for %u LPIs\n", lpis);
+ return err;
}
-static unsigned long *its_lpi_alloc_chunks(int nr_irqs, int *base, int *nr_ids)
+static unsigned long *its_lpi_alloc(int nr_irqs, u32 *base, int *nr_ids)
{
unsigned long *bitmap = NULL;
- int chunk_id;
- int nr_chunks;
- int i;
-
- nr_chunks = DIV_ROUND_UP(nr_irqs, IRQS_PER_CHUNK);
-
- spin_lock(&lpi_lock);
+ int err = 0;
do {
- chunk_id = bitmap_find_next_zero_area(lpi_bitmap, lpi_chunks,
- 0, nr_chunks, 0);
- if (chunk_id < lpi_chunks)
+ err = alloc_lpi_range(nr_irqs, base);
+ if (!err)
break;
- nr_chunks--;
- } while (nr_chunks > 0);
+ nr_irqs /= 2;
+ } while (nr_irqs > 0);
- if (!nr_chunks)
+ if (err)
goto out;
- bitmap = kcalloc(BITS_TO_LONGS(nr_chunks * IRQS_PER_CHUNK),
- sizeof(long),
- GFP_ATOMIC);
+ bitmap = kcalloc(BITS_TO_LONGS(nr_irqs), sizeof (long), GFP_ATOMIC);
if (!bitmap)
goto out;
- for (i = 0; i < nr_chunks; i++)
- set_bit(chunk_id + i, lpi_bitmap);
-
- *base = its_chunk_to_lpi(chunk_id);
- *nr_ids = nr_chunks * IRQS_PER_CHUNK;
+ *nr_ids = nr_irqs;
out:
- spin_unlock(&lpi_lock);
-
if (!bitmap)
*base = *nr_ids = 0;
return bitmap;
}
-static void its_lpi_free_chunks(unsigned long *bitmap, int base, int nr_ids)
+static void its_lpi_free(unsigned long *bitmap, u32 base, u32 nr_ids)
{
- int lpi;
-
- spin_lock(&lpi_lock);
-
- for (lpi = base; lpi < (base + nr_ids); lpi += IRQS_PER_CHUNK) {
- int chunk = its_lpi_to_chunk(lpi);
-
- BUG_ON(chunk > lpi_chunks);
- if (test_bit(chunk, lpi_bitmap)) {
- clear_bit(chunk, lpi_bitmap);
- } else {
- pr_err("Bad LPI chunk %d\n", chunk);
- }
- }
-
- spin_unlock(&lpi_lock);
-
+ WARN_ON(free_lpi_range(base, nr_ids));
kfree(bitmap);
}
@@ -1559,7 +1625,7 @@
{
phys_addr_t paddr;
- lpi_id_bits = min_t(u32, gic_rdists->id_bits, ITS_MAX_LPI_NRBITS);
+ lpi_id_bits = GICD_TYPER_ID_BITS(gic_rdists->gicd_typer);
gic_rdists->prop_page = its_allocate_prop_table(GFP_NOWAIT);
if (!gic_rdists->prop_page) {
pr_err("Failed to allocate PROPBASE\n");
@@ -1997,12 +2063,12 @@
{
struct its_node *its;
- spin_lock(&its_lock);
+ raw_spin_lock(&its_lock);
list_for_each_entry(its, &its_nodes, entry)
its_cpu_init_collection(its);
- spin_unlock(&its_lock);
+ raw_spin_unlock(&its_lock);
}
static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
@@ -2134,17 +2200,20 @@
if (!its_alloc_device_table(its, dev_id))
return NULL;
+ if (WARN_ON(!is_power_of_2(nvecs)))
+ nvecs = roundup_pow_of_two(nvecs);
+
dev = kzalloc(sizeof(*dev), GFP_KERNEL);
/*
- * We allocate at least one chunk worth of LPIs bet device,
- * and thus that many ITEs. The device may require less though.
+ * Even if the device wants a single LPI, the ITT must be
+ * sized as a power of two (and you need at least one bit...).
*/
- nr_ites = max(IRQS_PER_CHUNK, roundup_pow_of_two(nvecs));
+ nr_ites = max(2, nvecs);
sz = nr_ites * its->ite_size;
sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1;
itt = kzalloc(sz, GFP_KERNEL);
if (alloc_lpis) {
- lpi_map = its_lpi_alloc_chunks(nvecs, &lpi_base, &nr_lpis);
+ lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
if (lpi_map)
col_map = kcalloc(nr_lpis, sizeof(*col_map),
GFP_KERNEL);
@@ -2379,9 +2448,9 @@
/* If all interrupts have been freed, start mopping the floor */
if (bitmap_empty(its_dev->event_map.lpi_map,
its_dev->event_map.nr_lpis)) {
- its_lpi_free_chunks(its_dev->event_map.lpi_map,
- its_dev->event_map.lpi_base,
- its_dev->event_map.nr_lpis);
+ its_lpi_free(its_dev->event_map.lpi_map,
+ its_dev->event_map.lpi_base,
+ its_dev->event_map.nr_lpis);
kfree(its_dev->event_map.col_map);
/* Unmap device/itt */
@@ -2780,7 +2849,7 @@
}
if (bitmap_empty(vm->db_bitmap, vm->nr_db_lpis)) {
- its_lpi_free_chunks(vm->db_bitmap, vm->db_lpi_base, vm->nr_db_lpis);
+ its_lpi_free(vm->db_bitmap, vm->db_lpi_base, vm->nr_db_lpis);
its_free_prop_table(vm->vprop_page);
}
}
@@ -2795,18 +2864,18 @@
BUG_ON(!vm);
- bitmap = its_lpi_alloc_chunks(nr_irqs, &base, &nr_ids);
+ bitmap = its_lpi_alloc(roundup_pow_of_two(nr_irqs), &base, &nr_ids);
if (!bitmap)
return -ENOMEM;
if (nr_ids < nr_irqs) {
- its_lpi_free_chunks(bitmap, base, nr_ids);
+ its_lpi_free(bitmap, base, nr_ids);
return -ENOMEM;
}
vprop_page = its_allocate_prop_table(GFP_KERNEL);
if (!vprop_page) {
- its_lpi_free_chunks(bitmap, base, nr_ids);
+ its_lpi_free(bitmap, base, nr_ids);
return -ENOMEM;
}
@@ -2833,7 +2902,7 @@
if (i > 0)
its_vpe_irq_domain_free(domain, virq, i - 1);
- its_lpi_free_chunks(bitmap, base, nr_ids);
+ its_lpi_free(bitmap, base, nr_ids);
its_free_prop_table(vprop_page);
}
@@ -3070,7 +3139,7 @@
struct its_node *its;
int err = 0;
- spin_lock(&its_lock);
+ raw_spin_lock(&its_lock);
list_for_each_entry(its, &its_nodes, entry) {
void __iomem *base;
@@ -3102,7 +3171,7 @@
writel_relaxed(its->ctlr_save, base + GITS_CTLR);
}
}
- spin_unlock(&its_lock);
+ raw_spin_unlock(&its_lock);
return err;
}
@@ -3112,7 +3181,7 @@
struct its_node *its;
int ret;
- spin_lock(&its_lock);
+ raw_spin_lock(&its_lock);
list_for_each_entry(its, &its_nodes, entry) {
void __iomem *base;
int i;
@@ -3164,7 +3233,7 @@
GITS_TYPER_HCC(gic_read_typer(base + GITS_TYPER)))
its_cpu_init_collection(its);
}
- spin_unlock(&its_lock);
+ raw_spin_unlock(&its_lock);
}
static struct syscore_ops its_syscore_ops = {
@@ -3398,9 +3467,9 @@
if (err)
goto out_free_tables;
- spin_lock(&its_lock);
+ raw_spin_lock(&its_lock);
list_add(&its->entry, &its_nodes);
- spin_unlock(&its_lock);
+ raw_spin_unlock(&its_lock);
return 0;
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 76ea56d..e214181 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -877,7 +877,7 @@
.flags = IRQCHIP_SET_TYPE_MASKED,
};
-#define GIC_ID_NR (1U << gic_data.rdists.id_bits)
+#define GIC_ID_NR (1U << GICD_TYPER_ID_BITS(gic_data.rdists.gicd_typer))
static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
irq_hw_number_t hw)
@@ -1091,7 +1091,7 @@
* The GIC only supports up to 1020 interrupt sources (SGI+PPI+SPI)
*/
typer = readl_relaxed(gic_data.dist_base + GICD_TYPER);
- gic_data.rdists.id_bits = GICD_TYPER_ID_BITS(typer);
+ gic_data.rdists.gicd_typer = typer;
gic_irqs = GICD_TYPER_IRQS(typer);
if (gic_irqs > 1020)
gic_irqs = 1020;
diff --git a/drivers/irqchip/irq-ingenic.c b/drivers/irqchip/irq-ingenic.c
index fc5953d..2ff0898 100644
--- a/drivers/irqchip/irq-ingenic.c
+++ b/drivers/irqchip/irq-ingenic.c
@@ -165,6 +165,7 @@
return ingenic_intc_of_init(node, 1);
}
IRQCHIP_DECLARE(jz4740_intc, "ingenic,jz4740-intc", intc_1chip_of_init);
+IRQCHIP_DECLARE(jz4725b_intc, "ingenic,jz4725b-intc", intc_1chip_of_init);
static int __init intc_2chip_of_init(struct device_node *node,
struct device_node *parent)
diff --git a/drivers/irqchip/irq-stm32-exti.c b/drivers/irqchip/irq-stm32-exti.c
index 3a7e890..3df527f 100644
--- a/drivers/irqchip/irq-stm32-exti.c
+++ b/drivers/irqchip/irq-stm32-exti.c
@@ -159,6 +159,7 @@
};
static const struct stm32_desc_irq stm32mp1_desc_irq[] = {
+ { .exti = 0, .irq_parent = 6 },
{ .exti = 1, .irq_parent = 7 },
{ .exti = 2, .irq_parent = 8 },
{ .exti = 3, .irq_parent = 9 },
diff --git a/drivers/net/ethernet/8390/mac8390.c b/drivers/net/ethernet/8390/mac8390.c
index b6d735b..342ae08 100644
--- a/drivers/net/ethernet/8390/mac8390.c
+++ b/drivers/net/ethernet/8390/mac8390.c
@@ -153,9 +153,6 @@
static void dayna_block_output(struct net_device *dev, int count,
const unsigned char *buf, int start_page);
-#define memcpy_fromio(a, b, c) memcpy((a), (void *)(b), (c))
-#define memcpy_toio(a, b, c) memcpy((void *)(a), (b), (c))
-
#define memcmp_withio(a, b, c) memcmp((a), (void *)(b), (c))
/* Slow Sane (16-bit chunk memory read/write) Cabletron uses this */
@@ -239,7 +236,7 @@
unsigned long outdata = 0xA5A0B5B0;
unsigned long indata = 0x00000000;
/* Try writing 32 bits */
- memcpy_toio(membase, &outdata, 4);
+ memcpy_toio((void __iomem *)membase, &outdata, 4);
/* Now compare them */
if (memcmp_withio(&outdata, membase, 4) == 0)
return ACCESS_32;
@@ -711,7 +708,7 @@
struct e8390_pkt_hdr *hdr, int ring_page)
{
unsigned long hdr_start = (ring_page - WD_START_PG)<<8;
- memcpy_fromio(hdr, dev->mem_start + hdr_start, 4);
+ memcpy_fromio(hdr, (void __iomem *)dev->mem_start + hdr_start, 4);
/* Fix endianness */
hdr->count = swab16(hdr->count);
}
@@ -725,13 +722,16 @@
if (xfer_start + count > ei_status.rmem_end) {
/* We must wrap the input move. */
int semi_count = ei_status.rmem_end - xfer_start;
- memcpy_fromio(skb->data, dev->mem_start + xfer_base,
+ memcpy_fromio(skb->data,
+ (void __iomem *)dev->mem_start + xfer_base,
semi_count);
count -= semi_count;
- memcpy_fromio(skb->data + semi_count, ei_status.rmem_start,
- count);
+ memcpy_fromio(skb->data + semi_count,
+ (void __iomem *)ei_status.rmem_start, count);
} else {
- memcpy_fromio(skb->data, dev->mem_start + xfer_base, count);
+ memcpy_fromio(skb->data,
+ (void __iomem *)dev->mem_start + xfer_base,
+ count);
}
}
@@ -740,7 +740,7 @@
{
long shmem = (start_page - WD_START_PG)<<8;
- memcpy_toio(dev->mem_start + shmem, buf, count);
+ memcpy_toio((void __iomem *)dev->mem_start + shmem, buf, count);
}
/* dayna block input/output */
diff --git a/drivers/nubus/bus.c b/drivers/nubus/bus.c
index a59b6c4..ad3d17c 100644
--- a/drivers/nubus/bus.c
+++ b/drivers/nubus/bus.c
@@ -5,6 +5,7 @@
// Copyright (C) 2017 Finn Thain
#include <linux/device.h>
+#include <linux/dma-mapping.h>
#include <linux/list.h>
#include <linux/nubus.h>
#include <linux/seq_file.h>
@@ -93,6 +94,8 @@
board->dev.release = nubus_device_release;
board->dev.bus = &nubus_bus_type;
dev_set_name(&board->dev, "slot.%X", board->slot);
+ board->dev.dma_mask = &board->dev.coherent_dma_mask;
+ dma_set_mask(&board->dev, DMA_BIT_MASK(32));
return device_register(&board->dev);
}
diff --git a/drivers/s390/block/dasd.c b/drivers/s390/block/dasd.c
index a9f60d0..a23e7d3 100644
--- a/drivers/s390/block/dasd.c
+++ b/drivers/s390/block/dasd.c
@@ -73,8 +73,8 @@
static void dasd_setup_queue(struct dasd_block *);
static void dasd_free_queue(struct dasd_block *);
static int dasd_flush_block_queue(struct dasd_block *);
-static void dasd_device_tasklet(struct dasd_device *);
-static void dasd_block_tasklet(struct dasd_block *);
+static void dasd_device_tasklet(unsigned long);
+static void dasd_block_tasklet(unsigned long);
static void do_kick_device(struct work_struct *);
static void do_restore_device(struct work_struct *);
static void do_reload_device(struct work_struct *);
@@ -125,8 +125,7 @@
dasd_init_chunklist(&device->erp_chunks, device->erp_mem, PAGE_SIZE);
spin_lock_init(&device->mem_lock);
atomic_set(&device->tasklet_scheduled, 0);
- tasklet_init(&device->tasklet,
- (void (*)(unsigned long)) dasd_device_tasklet,
+ tasklet_init(&device->tasklet, dasd_device_tasklet,
(unsigned long) device);
INIT_LIST_HEAD(&device->ccw_queue);
timer_setup(&device->timer, dasd_device_timeout, 0);
@@ -166,8 +165,7 @@
atomic_set(&block->open_count, -1);
atomic_set(&block->tasklet_scheduled, 0);
- tasklet_init(&block->tasklet,
- (void (*)(unsigned long)) dasd_block_tasklet,
+ tasklet_init(&block->tasklet, dasd_block_tasklet,
(unsigned long) block);
INIT_LIST_HEAD(&block->ccw_queue);
spin_lock_init(&block->queue_lock);
@@ -2064,8 +2062,9 @@
/*
* Acquire the device lock and process queues for the device.
*/
-static void dasd_device_tasklet(struct dasd_device *device)
+static void dasd_device_tasklet(unsigned long data)
{
+ struct dasd_device *device = (struct dasd_device *) data;
struct list_head final_queue;
atomic_set (&device->tasklet_scheduled, 0);
@@ -2783,8 +2782,9 @@
* block layer request queue, creates ccw requests, enqueues them on
* a dasd_device and processes ccw requests that have been returned.
*/
-static void dasd_block_tasklet(struct dasd_block *block)
+static void dasd_block_tasklet(unsigned long data)
{
+ struct dasd_block *block = (struct dasd_block *) data;
struct list_head final_queue;
struct list_head *l, *n;
struct dasd_ccw_req *cqr;
@@ -3127,6 +3127,7 @@
block->tag_set.nr_hw_queues = nr_hw_queues;
block->tag_set.queue_depth = queue_depth;
block->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+ block->tag_set.numa_node = NUMA_NO_NODE;
rc = blk_mq_alloc_tag_set(&block->tag_set);
if (rc)
diff --git a/drivers/s390/block/dasd_alias.c b/drivers/s390/block/dasd_alias.c
index e36a114..b9ce93e 100644
--- a/drivers/s390/block/dasd_alias.c
+++ b/drivers/s390/block/dasd_alias.c
@@ -708,7 +708,7 @@
struct ccw1 *ccw;
cqr = lcu->rsu_cqr;
- strncpy((char *) &cqr->magic, "ECKD", 4);
+ memcpy((char *) &cqr->magic, "ECKD", 4);
ASCEBC((char *) &cqr->magic, 4);
ccw = cqr->cpaddr;
ccw->cmd_code = DASD_ECKD_CCW_RSCK;
diff --git a/drivers/s390/block/dasd_devmap.c b/drivers/s390/block/dasd_devmap.c
index b9ebb56..fab35c6 100644
--- a/drivers/s390/block/dasd_devmap.c
+++ b/drivers/s390/block/dasd_devmap.c
@@ -426,7 +426,7 @@
if (!devmap) {
/* This bus_id is new. */
new->devindex = dasd_max_devindex++;
- strncpy(new->bus_id, bus_id, DASD_BUS_ID_SIZE);
+ strlcpy(new->bus_id, bus_id, DASD_BUS_ID_SIZE);
new->features = features;
new->device = NULL;
list_add(&new->list, &dasd_hashlists[hash]);
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index bbf95b7..4e7b55a 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -1780,6 +1780,9 @@
struct dasd_eckd_private *private = device->private;
int i;
+ if (!private)
+ return;
+
dasd_alias_disconnect_device_from_lcu(device);
private->ned = NULL;
private->sneq = NULL;
@@ -2035,8 +2038,11 @@
static int dasd_eckd_online_to_ready(struct dasd_device *device)
{
- cancel_work_sync(&device->reload_device);
- cancel_work_sync(&device->kick_validate);
+ if (cancel_work_sync(&device->reload_device))
+ dasd_put_device(device);
+ if (cancel_work_sync(&device->kick_validate))
+ dasd_put_device(device);
+
return 0;
};
@@ -3535,7 +3541,7 @@
dcw = itcw_add_dcw(itcw, pfx_cmd, 0,
&pfxdata, sizeof(pfxdata), total_data_size);
- return PTR_RET(dcw);
+ return PTR_ERR_OR_ZERO(dcw);
}
static struct dasd_ccw_req *dasd_eckd_build_cp_tpm_track(
diff --git a/drivers/s390/block/dasd_eer.c b/drivers/s390/block/dasd_eer.c
index 6ef8714..93bb09d 100644
--- a/drivers/s390/block/dasd_eer.c
+++ b/drivers/s390/block/dasd_eer.c
@@ -313,7 +313,7 @@
ktime_get_real_ts64(&ts);
header.tv_sec = ts.tv_sec;
header.tv_usec = ts.tv_nsec / NSEC_PER_USEC;
- strncpy(header.busid, dev_name(&device->cdev->dev),
+ strlcpy(header.busid, dev_name(&device->cdev->dev),
DASD_EER_BUSID_SIZE);
spin_lock_irqsave(&bufferlock, flags);
@@ -356,7 +356,7 @@
ktime_get_real_ts64(&ts);
header.tv_sec = ts.tv_sec;
header.tv_usec = ts.tv_nsec / NSEC_PER_USEC;
- strncpy(header.busid, dev_name(&device->cdev->dev),
+ strlcpy(header.busid, dev_name(&device->cdev->dev),
DASD_EER_BUSID_SIZE);
spin_lock_irqsave(&bufferlock, flags);
diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c
index b1fcb76..98f66b7 100644
--- a/drivers/s390/block/scm_blk.c
+++ b/drivers/s390/block/scm_blk.c
@@ -455,6 +455,7 @@
bdev->tag_set.nr_hw_queues = nr_requests;
bdev->tag_set.queue_depth = nr_requests_per_io * nr_requests;
bdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+ bdev->tag_set.numa_node = NUMA_NO_NODE;
ret = blk_mq_alloc_tag_set(&bdev->tag_set);
if (ret)
diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
index 0a4c13e..c6ab34f 100644
--- a/drivers/s390/char/Makefile
+++ b/drivers/s390/char/Makefile
@@ -12,11 +12,6 @@
KCOV_INSTRUMENT_sclp_early_core.o := n
UBSAN_SANITIZE_sclp_early_core.o := n
-ifneq ($(CC_FLAGS_MARCH),-march=z900)
-CFLAGS_REMOVE_sclp_early_core.o += $(CC_FLAGS_MARCH)
-CFLAGS_sclp_early_core.o += -march=z900
-endif
-
CFLAGS_sclp_early_core.o += -D__NO_FORTIFY
CFLAGS_REMOVE_sclp_early_core.o += $(CC_FLAGS_EXPOLINE)
diff --git a/drivers/s390/char/keyboard.c b/drivers/s390/char/keyboard.c
index 79eb609..bbb3001 100644
--- a/drivers/s390/char/keyboard.c
+++ b/drivers/s390/char/keyboard.c
@@ -334,37 +334,41 @@
int cmd, int perm)
{
struct kbentry tmp;
+ unsigned long kb_index, kb_table;
ushort *key_map, val, ov;
if (copy_from_user(&tmp, user_kbe, sizeof(struct kbentry)))
return -EFAULT;
+ kb_index = (unsigned long) tmp.kb_index;
#if NR_KEYS < 256
- if (tmp.kb_index >= NR_KEYS)
+ if (kb_index >= NR_KEYS)
return -EINVAL;
#endif
+ kb_table = (unsigned long) tmp.kb_table;
#if MAX_NR_KEYMAPS < 256
- if (tmp.kb_table >= MAX_NR_KEYMAPS)
+ if (kb_table >= MAX_NR_KEYMAPS)
return -EINVAL;
+ kb_table = array_index_nospec(kb_table , MAX_NR_KEYMAPS);
#endif
switch (cmd) {
case KDGKBENT:
- key_map = kbd->key_maps[tmp.kb_table];
+ key_map = kbd->key_maps[kb_table];
if (key_map) {
- val = U(key_map[tmp.kb_index]);
+ val = U(key_map[kb_index]);
if (KTYP(val) >= KBD_NR_TYPES)
val = K_HOLE;
} else
- val = (tmp.kb_index ? K_HOLE : K_NOSUCHMAP);
+ val = (kb_index ? K_HOLE : K_NOSUCHMAP);
return put_user(val, &user_kbe->kb_value);
case KDSKBENT:
if (!perm)
return -EPERM;
- if (!tmp.kb_index && tmp.kb_value == K_NOSUCHMAP) {
+ if (!kb_index && tmp.kb_value == K_NOSUCHMAP) {
/* disallocate map */
- key_map = kbd->key_maps[tmp.kb_table];
+ key_map = kbd->key_maps[kb_table];
if (key_map) {
- kbd->key_maps[tmp.kb_table] = NULL;
+ kbd->key_maps[kb_table] = NULL;
kfree(key_map);
}
break;
@@ -375,18 +379,18 @@
if (KVAL(tmp.kb_value) > kbd_max_vals[KTYP(tmp.kb_value)])
return -EINVAL;
- if (!(key_map = kbd->key_maps[tmp.kb_table])) {
+ if (!(key_map = kbd->key_maps[kb_table])) {
int j;
key_map = kmalloc(sizeof(plain_map),
GFP_KERNEL);
if (!key_map)
return -ENOMEM;
- kbd->key_maps[tmp.kb_table] = key_map;
+ kbd->key_maps[kb_table] = key_map;
for (j = 0; j < NR_KEYS; j++)
key_map[j] = U(K_HOLE);
}
- ov = U(key_map[tmp.kb_index]);
+ ov = U(key_map[kb_index]);
if (tmp.kb_value == ov)
break; /* nothing to do */
/*
@@ -395,7 +399,7 @@
if (((ov == K_SAK) || (tmp.kb_value == K_SAK)) &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
- key_map[tmp.kb_index] = U(tmp.kb_value);
+ key_map[kb_index] = U(tmp.kb_value);
break;
}
return 0;
diff --git a/drivers/s390/char/monwriter.c b/drivers/s390/char/monwriter.c
index 76c158c..4f1a69c 100644
--- a/drivers/s390/char/monwriter.c
+++ b/drivers/s390/char/monwriter.c
@@ -61,7 +61,7 @@
struct appldata_product_id id;
int rc;
- strncpy(id.prod_nr, "LNXAPPL", 7);
+ memcpy(id.prod_nr, "LNXAPPL", 7);
id.prod_fn = myhdr->applid;
id.record_nr = myhdr->record_num;
id.version_nr = myhdr->version;
diff --git a/drivers/s390/char/sclp_async.c b/drivers/s390/char/sclp_async.c
index ee6f3b5..e69b12a 100644
--- a/drivers/s390/char/sclp_async.c
+++ b/drivers/s390/char/sclp_async.c
@@ -64,42 +64,18 @@
.priority = INT_MAX,
};
-static int proc_handler_callhome(struct ctl_table *ctl, int write,
- void __user *buffer, size_t *count,
- loff_t *ppos)
-{
- unsigned long val;
- int len, rc;
- char buf[3];
-
- if (!*count || (*ppos && !write)) {
- *count = 0;
- return 0;
- }
- if (!write) {
- len = snprintf(buf, sizeof(buf), "%d\n", callhome_enabled);
- rc = copy_to_user(buffer, buf, sizeof(buf));
- if (rc != 0)
- return -EFAULT;
- } else {
- len = *count;
- rc = kstrtoul_from_user(buffer, len, 0, &val);
- if (rc)
- return rc;
- if (val != 0 && val != 1)
- return -EINVAL;
- callhome_enabled = val;
- }
- *count = len;
- *ppos += len;
- return 0;
-}
+static int zero;
+static int one = 1;
static struct ctl_table callhome_table[] = {
{
.procname = "callhome",
+ .data = &callhome_enabled,
+ .maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_handler_callhome,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
},
{}
};
diff --git a/drivers/s390/char/tape_3590.c b/drivers/s390/char/tape_3590.c
index 37e65a0..cdcde18 100644
--- a/drivers/s390/char/tape_3590.c
+++ b/drivers/s390/char/tape_3590.c
@@ -113,16 +113,16 @@
static void ext_to_int_kekl(struct tape390_kekl *in,
struct tape3592_kekl *out)
{
- int i;
+ int len;
memset(out, 0, sizeof(*out));
if (in->type == TAPE390_KEKL_TYPE_HASH)
out->flags |= 0x40;
if (in->type_on_tape == TAPE390_KEKL_TYPE_HASH)
out->flags |= 0x80;
- strncpy(out->label, in->label, 64);
- for (i = strlen(in->label); i < sizeof(out->label); i++)
- out->label[i] = ' ';
+ len = min(sizeof(out->label), strlen(in->label));
+ memcpy(out->label, in->label, len);
+ memset(out->label + len, ' ', sizeof(out->label) - len);
ASCEBC(out->label, sizeof(out->label));
}
diff --git a/drivers/s390/char/tape_class.c b/drivers/s390/char/tape_class.c
index a071024..b58df0d 100644
--- a/drivers/s390/char/tape_class.c
+++ b/drivers/s390/char/tape_class.c
@@ -54,10 +54,10 @@
if (!tcd)
return ERR_PTR(-ENOMEM);
- strncpy(tcd->device_name, device_name, TAPECLASS_NAME_LEN);
+ strlcpy(tcd->device_name, device_name, TAPECLASS_NAME_LEN);
for (s = strchr(tcd->device_name, '/'); s; s = strchr(s, '/'))
*s = '!';
- strncpy(tcd->mode_name, mode_name, TAPECLASS_NAME_LEN);
+ strlcpy(tcd->mode_name, mode_name, TAPECLASS_NAME_LEN);
for (s = strchr(tcd->mode_name, '/'); s; s = strchr(s, '/'))
*s = '!';
@@ -77,7 +77,7 @@
tcd->class_device = device_create(tape_class, device,
tcd->char_device->dev, NULL,
"%s", tcd->device_name);
- rc = PTR_RET(tcd->class_device);
+ rc = PTR_ERR_OR_ZERO(tcd->class_device);
if (rc)
goto fail_with_cdev;
rc = sysfs_create_link(
diff --git a/drivers/s390/cio/chp.c b/drivers/s390/cio/chp.c
index afbdee7..51038ec 100644
--- a/drivers/s390/cio/chp.c
+++ b/drivers/s390/cio/chp.c
@@ -471,14 +471,17 @@
{
struct channel_subsystem *css = css_by_id(chpid.cssid);
struct channel_path *chp;
- int ret;
+ int ret = 0;
+ mutex_lock(&css->mutex);
if (chp_is_registered(chpid))
- return 0;
- chp = kzalloc(sizeof(struct channel_path), GFP_KERNEL);
- if (!chp)
- return -ENOMEM;
+ goto out;
+ chp = kzalloc(sizeof(struct channel_path), GFP_KERNEL);
+ if (!chp) {
+ ret = -ENOMEM;
+ goto out;
+ }
/* fill in status, etc. */
chp->chpid = chpid;
chp->state = 1;
@@ -505,21 +508,20 @@
put_device(&chp->dev);
goto out;
}
- mutex_lock(&css->mutex);
+
if (css->cm_enabled) {
ret = chp_add_cmg_attr(chp);
if (ret) {
device_unregister(&chp->dev);
- mutex_unlock(&css->mutex);
goto out;
}
}
css->chps[chpid.id] = chp;
- mutex_unlock(&css->mutex);
goto out;
out_free:
kfree(chp);
out:
+ mutex_unlock(&css->mutex);
return ret;
}
@@ -585,8 +587,7 @@
switch (crw0->erc) {
case CRW_ERC_IPARM: /* Path has come. */
case CRW_ERC_INIT:
- if (!chp_is_registered(chpid))
- chp_new(chpid);
+ chp_new(chpid);
chsc_chp_online(chpid);
break;
case CRW_ERC_PERRI: /* Path has gone. */
diff --git a/drivers/s390/cio/chsc.c b/drivers/s390/cio/chsc.c
index 9029804..a0baee2 100644
--- a/drivers/s390/cio/chsc.c
+++ b/drivers/s390/cio/chsc.c
@@ -91,7 +91,7 @@
u16 sch; /* subchannel */
u8 chpid[8]; /* chpids 0-7 */
u16 fla[8]; /* full link addresses 0-7 */
-} __attribute__ ((packed));
+} __packed __aligned(PAGE_SIZE);
int chsc_get_ssd_info(struct subchannel_id schid, struct chsc_ssd_info *ssd)
{
@@ -319,7 +319,7 @@
struct chsc_sei_nt2_area nt2_area;
u8 nt_area[PAGE_SIZE - 24];
} u;
-} __packed;
+} __packed __aligned(PAGE_SIZE);
/*
* Node Descriptor as defined in SA22-7204, "Common I/O-Device Commands"
@@ -841,7 +841,7 @@
u32 : 4;
u32 fmt : 4;
u32 : 16;
- } __attribute__ ((packed)) *secm_area;
+ } *secm_area;
unsigned long flags;
int ret, ccode;
@@ -1014,7 +1014,7 @@
u32 cmg : 8;
u32 zeroes3;
u32 data[NR_MEASUREMENT_CHARS];
- } __attribute__ ((packed)) *scmc_area;
+ } *scmc_area;
chp->shared = -1;
chp->cmg = -1;
@@ -1142,7 +1142,7 @@
u8 cssid;
u32 : 24;
} list[0];
- } __packed *sdcal_area;
+ } *sdcal_area;
int ret;
spin_lock_irq(&chsc_page_lock);
@@ -1192,7 +1192,7 @@
u32 reserved4;
u32 general_char[510];
u32 chsc_char[508];
- } __attribute__ ((packed)) *scsc_area;
+ } *scsc_area;
spin_lock_irqsave(&chsc_page_lock, flags);
memset(chsc_page, 0, PAGE_SIZE);
@@ -1236,7 +1236,7 @@
unsigned int rsvd3[3];
u64 clock_delta;
unsigned int rsvd4[2];
- } __attribute__ ((packed)) *rr;
+ } *rr;
int rc;
memset(page, 0, PAGE_SIZE);
@@ -1261,7 +1261,7 @@
unsigned int rsvd0[3];
struct chsc_header response;
char data[];
- } __attribute__ ((packed)) *rr;
+ } *rr;
int rc;
memset(page, 0, PAGE_SIZE);
@@ -1284,7 +1284,7 @@
u32 word3;
struct chsc_header response;
u32 word[11];
- } __attribute__ ((packed)) *siosl_area;
+ } *siosl_area;
unsigned long flags;
int ccode;
int rc;
diff --git a/drivers/s390/cio/chsc.h b/drivers/s390/cio/chsc.h
index 5c9f0dd..78aba8d 100644
--- a/drivers/s390/cio/chsc.h
+++ b/drivers/s390/cio/chsc.h
@@ -15,12 +15,12 @@
#define NR_MEASUREMENT_CHARS 5
struct cmg_chars {
u32 values[NR_MEASUREMENT_CHARS];
-} __attribute__ ((packed));
+};
#define NR_MEASUREMENT_ENTRIES 8
struct cmg_entry {
u32 values[NR_MEASUREMENT_ENTRIES];
-} __attribute__ ((packed));
+};
struct channel_path_desc_fmt1 {
u8 flags;
@@ -38,7 +38,7 @@
u8 s:1;
u8 f:1;
u32 zeros[2];
-} __attribute__ ((packed));
+};
struct channel_path_desc_fmt3 {
struct channel_path_desc_fmt1 fmt1_desc;
@@ -59,7 +59,7 @@
u32:7;
u32 pnso:1; /* bit 116 */
u32:11;
-}__attribute__((packed));
+} __packed;
extern struct css_chsc_char css_chsc_characteristics;
@@ -82,7 +82,7 @@
struct chsc_header response;
u32:32;
struct qdio_ssqd_desc qdio_ssqd;
-} __packed;
+} __packed __aligned(PAGE_SIZE);
struct chsc_scssc_area {
struct chsc_header request;
@@ -102,7 +102,7 @@
u32 reserved[1004];
struct chsc_header response;
u32:32;
-} __packed;
+} __packed __aligned(PAGE_SIZE);
struct chsc_scpd {
struct chsc_header request;
@@ -120,7 +120,7 @@
struct chsc_header response;
u32:32;
u8 data[0];
-} __packed;
+} __packed __aligned(PAGE_SIZE);
struct chsc_sda_area {
struct chsc_header request;
@@ -199,7 +199,7 @@
u32 reserved2[10];
u64 restok;
struct sale scmal[248];
-} __packed;
+} __packed __aligned(PAGE_SIZE);
int chsc_scm_info(struct chsc_scm_info *scm_area, u64 token);
@@ -243,7 +243,7 @@
struct qdio_brinfo_entry_l3_ipv4 l3_ipv4[0];
struct qdio_brinfo_entry_l2 l2[0];
} entries;
-} __packed;
+} __packed __aligned(PAGE_SIZE);
int chsc_pnso_brinfo(struct subchannel_id schid,
struct chsc_pnso_area *brinfo_area,
diff --git a/drivers/s390/cio/cio.c b/drivers/s390/cio/cio.c
index 5130d7c..de744ca 100644
--- a/drivers/s390/cio/cio.c
+++ b/drivers/s390/cio/cio.c
@@ -526,76 +526,6 @@
}
EXPORT_SYMBOL_GPL(cio_disable_subchannel);
-static int cio_check_devno_blacklisted(struct subchannel *sch)
-{
- if (is_blacklisted(sch->schid.ssid, sch->schib.pmcw.dev)) {
- /*
- * This device must not be known to Linux. So we simply
- * say that there is no device and return ENODEV.
- */
- CIO_MSG_EVENT(6, "Blacklisted device detected "
- "at devno %04X, subchannel set %x\n",
- sch->schib.pmcw.dev, sch->schid.ssid);
- return -ENODEV;
- }
- return 0;
-}
-
-/**
- * cio_validate_subchannel - basic validation of subchannel
- * @sch: subchannel structure to be filled out
- * @schid: subchannel id
- *
- * Find out subchannel type and initialize struct subchannel.
- * Return codes:
- * 0 on success
- * -ENXIO for non-defined subchannels
- * -ENODEV for invalid subchannels or blacklisted devices
- * -EIO for subchannels in an invalid subchannel set
- */
-int cio_validate_subchannel(struct subchannel *sch, struct subchannel_id schid)
-{
- char dbf_txt[15];
- int ccode;
- int err;
-
- sprintf(dbf_txt, "valsch%x", schid.sch_no);
- CIO_TRACE_EVENT(4, dbf_txt);
-
- /*
- * The first subchannel that is not-operational (ccode==3)
- * indicates that there aren't any more devices available.
- * If stsch gets an exception, it means the current subchannel set
- * is not valid.
- */
- ccode = stsch(schid, &sch->schib);
- if (ccode) {
- err = (ccode == 3) ? -ENXIO : ccode;
- goto out;
- }
- sch->st = sch->schib.pmcw.st;
- sch->schid = schid;
-
- switch (sch->st) {
- case SUBCHANNEL_TYPE_IO:
- case SUBCHANNEL_TYPE_MSG:
- if (!css_sch_is_valid(&sch->schib))
- err = -ENODEV;
- else
- err = cio_check_devno_blacklisted(sch);
- break;
- default:
- err = 0;
- }
- if (err)
- goto out;
-
- CIO_MSG_EVENT(4, "Subchannel 0.%x.%04x reports subchannel type %04X\n",
- sch->schid.ssid, sch->schid.sch_no, sch->st);
-out:
- return err;
-}
-
/*
* do_cio_interrupt() handles all normal I/O device IRQ's
*/
@@ -719,6 +649,7 @@
{
struct subchannel_id schid;
struct subchannel *sch;
+ struct schib schib;
int sch_no, ret;
sch_no = cio_get_console_sch_no();
@@ -728,7 +659,11 @@
}
init_subchannel_id(&schid);
schid.sch_no = sch_no;
- sch = css_alloc_subchannel(schid);
+ ret = stsch(schid, &schib);
+ if (ret)
+ return ERR_PTR(-ENODEV);
+
+ sch = css_alloc_subchannel(schid, &schib);
if (IS_ERR(sch))
return sch;
diff --git a/drivers/s390/cio/cio.h b/drivers/s390/cio/cio.h
index 94cd813..9811fd8 100644
--- a/drivers/s390/cio/cio.h
+++ b/drivers/s390/cio/cio.h
@@ -119,7 +119,6 @@
#define to_subchannel(n) container_of(n, struct subchannel, dev)
-extern int cio_validate_subchannel (struct subchannel *, struct subchannel_id);
extern int cio_enable_subchannel(struct subchannel *, u32);
extern int cio_disable_subchannel (struct subchannel *);
extern int cio_cancel (struct subchannel *);
diff --git a/drivers/s390/cio/css.c b/drivers/s390/cio/css.c
index 9263a0f..aea5029 100644
--- a/drivers/s390/cio/css.c
+++ b/drivers/s390/cio/css.c
@@ -25,6 +25,7 @@
#include "css.h"
#include "cio.h"
+#include "blacklist.h"
#include "cio_debug.h"
#include "ioasm.h"
#include "chsc.h"
@@ -168,18 +169,53 @@
kfree(sch);
}
-struct subchannel *css_alloc_subchannel(struct subchannel_id schid)
+static int css_validate_subchannel(struct subchannel_id schid,
+ struct schib *schib)
+{
+ int err;
+
+ switch (schib->pmcw.st) {
+ case SUBCHANNEL_TYPE_IO:
+ case SUBCHANNEL_TYPE_MSG:
+ if (!css_sch_is_valid(schib))
+ err = -ENODEV;
+ else if (is_blacklisted(schid.ssid, schib->pmcw.dev)) {
+ CIO_MSG_EVENT(6, "Blacklisted device detected "
+ "at devno %04X, subchannel set %x\n",
+ schib->pmcw.dev, schid.ssid);
+ err = -ENODEV;
+ } else
+ err = 0;
+ break;
+ default:
+ err = 0;
+ }
+ if (err)
+ goto out;
+
+ CIO_MSG_EVENT(4, "Subchannel 0.%x.%04x reports subchannel type %04X\n",
+ schid.ssid, schid.sch_no, schib->pmcw.st);
+out:
+ return err;
+}
+
+struct subchannel *css_alloc_subchannel(struct subchannel_id schid,
+ struct schib *schib)
{
struct subchannel *sch;
int ret;
+ ret = css_validate_subchannel(schid, schib);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
sch = kzalloc(sizeof(*sch), GFP_KERNEL | GFP_DMA);
if (!sch)
return ERR_PTR(-ENOMEM);
- ret = cio_validate_subchannel(sch, schid);
- if (ret < 0)
- goto err;
+ sch->schid = schid;
+ sch->schib = *schib;
+ sch->st = schib->pmcw.st;
ret = css_sch_create_locks(sch);
if (ret)
@@ -244,8 +280,7 @@
for (i = 0; i < 8; i++) {
mask = 0x80 >> i;
if (ssd->path_mask & mask)
- if (!chp_is_registered(ssd->chpid[i]))
- chp_new(ssd->chpid[i]);
+ chp_new(ssd->chpid[i]);
}
}
@@ -382,12 +417,12 @@
return ret;
}
-static int css_probe_device(struct subchannel_id schid)
+static int css_probe_device(struct subchannel_id schid, struct schib *schib)
{
struct subchannel *sch;
int ret;
- sch = css_alloc_subchannel(schid);
+ sch = css_alloc_subchannel(schid, schib);
if (IS_ERR(sch))
return PTR_ERR(sch);
@@ -436,23 +471,23 @@
static int css_evaluate_new_subchannel(struct subchannel_id schid, int slow)
{
struct schib schib;
+ int ccode;
if (!slow) {
/* Will be done on the slow path. */
return -EAGAIN;
}
- if (stsch(schid, &schib)) {
- /* Subchannel is not provided. */
- return -ENXIO;
- }
- if (!css_sch_is_valid(&schib)) {
- /* Unusable - ignore. */
- return 0;
- }
- CIO_MSG_EVENT(4, "event: sch 0.%x.%04x, new\n", schid.ssid,
- schid.sch_no);
+ /*
+ * The first subchannel that is not-operational (ccode==3)
+ * indicates that there aren't any more devices available.
+ * If stsch gets an exception, it means the current subchannel set
+ * is not valid.
+ */
+ ccode = stsch(schid, &schib);
+ if (ccode)
+ return (ccode == 3) ? -ENXIO : ccode;
- return css_probe_device(schid);
+ return css_probe_device(schid, &schib);
}
static int css_evaluate_known_subchannel(struct subchannel *sch, int slow)
@@ -1081,6 +1116,11 @@
if (ret)
goto out_wq;
+ /* Register subchannels which are already in use. */
+ cio_register_early_subchannels();
+ /* Start initial subchannel evaluation. */
+ css_schedule_eval_all();
+
return ret;
out_wq:
destroy_workqueue(cio_work_q);
@@ -1120,10 +1160,6 @@
*/
static int __init channel_subsystem_init_sync(void)
{
- /* Register subchannels which are already in use. */
- cio_register_early_subchannels();
- /* Start initial subchannel evaluation. */
- css_schedule_eval_all();
css_complete_work();
return 0;
}
diff --git a/drivers/s390/cio/css.h b/drivers/s390/cio/css.h
index 30357cb..8d83290 100644
--- a/drivers/s390/cio/css.h
+++ b/drivers/s390/cio/css.h
@@ -103,7 +103,8 @@
extern void css_sch_device_unregister(struct subchannel *);
extern int css_register_subchannel(struct subchannel *);
-extern struct subchannel *css_alloc_subchannel(struct subchannel_id);
+extern struct subchannel *css_alloc_subchannel(struct subchannel_id,
+ struct schib *schib);
extern struct subchannel *get_subchannel_by_schid(struct subchannel_id);
extern int css_init_done;
extern int max_ssid;
diff --git a/drivers/s390/cio/qdio_main.c b/drivers/s390/cio/qdio_main.c
index f4ca72d..9c7d9da 100644
--- a/drivers/s390/cio/qdio_main.c
+++ b/drivers/s390/cio/qdio_main.c
@@ -631,21 +631,20 @@
unsigned long phys_aob = 0;
if (!q->use_cq)
- goto out;
+ return 0;
if (!q->aobs[bufnr]) {
struct qaob *aob = qdio_allocate_aob();
q->aobs[bufnr] = aob;
}
if (q->aobs[bufnr]) {
- q->sbal_state[bufnr].flags = QDIO_OUTBUF_STATE_FLAG_NONE;
q->sbal_state[bufnr].aob = q->aobs[bufnr];
q->aobs[bufnr]->user1 = (u64) q->sbal_state[bufnr].user;
phys_aob = virt_to_phys(q->aobs[bufnr]);
WARN_ON_ONCE(phys_aob & 0xFF);
}
-out:
+ q->sbal_state[bufnr].flags = 0;
return phys_aob;
}
diff --git a/drivers/s390/cio/trace.h b/drivers/s390/cio/trace.h
index 1f8d1c1..0ebb29b 100644
--- a/drivers/s390/cio/trace.h
+++ b/drivers/s390/cio/trace.h
@@ -30,6 +30,17 @@
__field(u16, schno)
__field(u16, devno)
__field_struct(struct schib, schib)
+ __field(u8, pmcw_ena)
+ __field(u8, pmcw_st)
+ __field(u8, pmcw_dnv)
+ __field(u16, pmcw_dev)
+ __field(u8, pmcw_lpm)
+ __field(u8, pmcw_pnom)
+ __field(u8, pmcw_lpum)
+ __field(u8, pmcw_pim)
+ __field(u8, pmcw_pam)
+ __field(u8, pmcw_pom)
+ __field(u64, pmcw_chpid)
__field(int, cc)
),
TP_fast_assign(
@@ -38,18 +49,29 @@
__entry->schno = schid.sch_no;
__entry->devno = schib->pmcw.dev;
__entry->schib = *schib;
+ __entry->pmcw_ena = schib->pmcw.ena;
+ __entry->pmcw_st = schib->pmcw.ena;
+ __entry->pmcw_dnv = schib->pmcw.dnv;
+ __entry->pmcw_dev = schib->pmcw.dev;
+ __entry->pmcw_lpm = schib->pmcw.lpm;
+ __entry->pmcw_pnom = schib->pmcw.pnom;
+ __entry->pmcw_lpum = schib->pmcw.lpum;
+ __entry->pmcw_pim = schib->pmcw.pim;
+ __entry->pmcw_pam = schib->pmcw.pam;
+ __entry->pmcw_pom = schib->pmcw.pom;
+ memcpy(&__entry->pmcw_chpid, &schib->pmcw.chpid, 8);
__entry->cc = cc;
),
TP_printk("schid=%x.%x.%04x cc=%d ena=%d st=%d dnv=%d dev=%04x "
"lpm=0x%02x pnom=0x%02x lpum=0x%02x pim=0x%02x pam=0x%02x "
"pom=0x%02x chpids=%016llx",
__entry->cssid, __entry->ssid, __entry->schno, __entry->cc,
- __entry->schib.pmcw.ena, __entry->schib.pmcw.st,
- __entry->schib.pmcw.dnv, __entry->schib.pmcw.dev,
- __entry->schib.pmcw.lpm, __entry->schib.pmcw.pnom,
- __entry->schib.pmcw.lpum, __entry->schib.pmcw.pim,
- __entry->schib.pmcw.pam, __entry->schib.pmcw.pom,
- *((u64 *) __entry->schib.pmcw.chpid)
+ __entry->pmcw_ena, __entry->pmcw_st,
+ __entry->pmcw_dnv, __entry->pmcw_dev,
+ __entry->pmcw_lpm, __entry->pmcw_pnom,
+ __entry->pmcw_lpum, __entry->pmcw_pim,
+ __entry->pmcw_pam, __entry->pmcw_pom,
+ __entry->pmcw_chpid
)
);
@@ -89,6 +111,13 @@
__field(u8, ssid)
__field(u16, schno)
__field_struct(struct irb, irb)
+ __field(u8, scsw_dcc)
+ __field(u8, scsw_pno)
+ __field(u8, scsw_fctl)
+ __field(u8, scsw_actl)
+ __field(u8, scsw_stctl)
+ __field(u8, scsw_dstat)
+ __field(u8, scsw_cstat)
__field(int, cc)
),
TP_fast_assign(
@@ -96,15 +125,22 @@
__entry->ssid = schid.ssid;
__entry->schno = schid.sch_no;
__entry->irb = *irb;
+ __entry->scsw_dcc = scsw_cc(&irb->scsw);
+ __entry->scsw_pno = scsw_pno(&irb->scsw);
+ __entry->scsw_fctl = scsw_fctl(&irb->scsw);
+ __entry->scsw_actl = scsw_actl(&irb->scsw);
+ __entry->scsw_stctl = scsw_stctl(&irb->scsw);
+ __entry->scsw_dstat = scsw_dstat(&irb->scsw);
+ __entry->scsw_cstat = scsw_cstat(&irb->scsw);
__entry->cc = cc;
),
TP_printk("schid=%x.%x.%04x cc=%d dcc=%d pno=%d fctl=0x%x actl=0x%x "
"stctl=0x%x dstat=0x%x cstat=0x%x",
__entry->cssid, __entry->ssid, __entry->schno, __entry->cc,
- scsw_cc(&__entry->irb.scsw), scsw_pno(&__entry->irb.scsw),
- scsw_fctl(&__entry->irb.scsw), scsw_actl(&__entry->irb.scsw),
- scsw_stctl(&__entry->irb.scsw),
- scsw_dstat(&__entry->irb.scsw), scsw_cstat(&__entry->irb.scsw)
+ __entry->scsw_dcc, __entry->scsw_pno,
+ __entry->scsw_fctl, __entry->scsw_actl,
+ __entry->scsw_stctl,
+ __entry->scsw_dstat, __entry->scsw_cstat
)
);
@@ -122,6 +158,9 @@
__field(u8, cssid)
__field(u8, ssid)
__field(u16, schno)
+ __field(u8, adapter_IO)
+ __field(u8, isc)
+ __field(u8, type)
),
TP_fast_assign(
__entry->cc = cc;
@@ -136,11 +175,14 @@
__entry->cssid = __entry->tpi_info.schid.cssid;
__entry->ssid = __entry->tpi_info.schid.ssid;
__entry->schno = __entry->tpi_info.schid.sch_no;
+ __entry->adapter_IO = __entry->tpi_info.adapter_IO;
+ __entry->isc = __entry->tpi_info.isc;
+ __entry->type = __entry->tpi_info.type;
),
TP_printk("schid=%x.%x.%04x cc=%d a=%d isc=%d type=%d",
__entry->cssid, __entry->ssid, __entry->schno, __entry->cc,
- __entry->tpi_info.adapter_IO, __entry->tpi_info.isc,
- __entry->tpi_info.type
+ __entry->adapter_IO, __entry->isc,
+ __entry->type
)
);
@@ -299,16 +341,20 @@
__field(u8, cssid)
__field(u8, ssid)
__field(u16, schno)
+ __field(u8, isc)
+ __field(u8, type)
),
TP_fast_assign(
__entry->tpi_info = *tpi_info;
- __entry->cssid = __entry->tpi_info.schid.cssid;
- __entry->ssid = __entry->tpi_info.schid.ssid;
- __entry->schno = __entry->tpi_info.schid.sch_no;
+ __entry->cssid = tpi_info->schid.cssid;
+ __entry->ssid = tpi_info->schid.ssid;
+ __entry->schno = tpi_info->schid.sch_no;
+ __entry->isc = tpi_info->isc;
+ __entry->type = tpi_info->type;
),
TP_printk("schid=%x.%x.%04x isc=%d type=%d",
__entry->cssid, __entry->ssid, __entry->schno,
- __entry->tpi_info.isc, __entry->tpi_info.type
+ __entry->isc, __entry->type
)
);
@@ -321,11 +367,13 @@
TP_ARGS(tpi_info),
TP_STRUCT__entry(
__field_struct(struct tpi_info, tpi_info)
+ __field(u8, isc)
),
TP_fast_assign(
__entry->tpi_info = *tpi_info;
+ __entry->isc = tpi_info->isc;
),
- TP_printk("isc=%d", __entry->tpi_info.isc)
+ TP_printk("isc=%d", __entry->isc)
);
/**
@@ -339,16 +387,30 @@
TP_STRUCT__entry(
__field_struct(struct crw, crw)
__field(int, cc)
+ __field(u8, slct)
+ __field(u8, oflw)
+ __field(u8, chn)
+ __field(u8, rsc)
+ __field(u8, anc)
+ __field(u8, erc)
+ __field(u16, rsid)
),
TP_fast_assign(
__entry->crw = *crw;
__entry->cc = cc;
+ __entry->slct = crw->slct;
+ __entry->oflw = crw->oflw;
+ __entry->chn = crw->chn;
+ __entry->rsc = crw->rsc;
+ __entry->anc = crw->anc;
+ __entry->erc = crw->erc;
+ __entry->rsid = crw->rsid;
),
TP_printk("cc=%d slct=%d oflw=%d chn=%d rsc=%d anc=%d erc=0x%x "
"rsid=0x%x",
- __entry->cc, __entry->crw.slct, __entry->crw.oflw,
- __entry->crw.chn, __entry->crw.rsc, __entry->crw.anc,
- __entry->crw.erc, __entry->crw.rsid
+ __entry->cc, __entry->slct, __entry->oflw,
+ __entry->chn, __entry->rsc, __entry->anc,
+ __entry->erc, __entry->rsid
)
);
diff --git a/drivers/s390/crypto/ap_asm.h b/drivers/s390/crypto/ap_asm.h
deleted file mode 100644
index 16b59ce..0000000
--- a/drivers/s390/crypto/ap_asm.h
+++ /dev/null
@@ -1,236 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright IBM Corp. 2016
- * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
- *
- * Adjunct processor bus inline assemblies.
- */
-
-#ifndef _AP_ASM_H_
-#define _AP_ASM_H_
-
-#include <asm/isc.h>
-
-/**
- * ap_intructions_available() - Test if AP instructions are available.
- *
- * Returns 0 if the AP instructions are installed.
- */
-static inline int ap_instructions_available(void)
-{
- register unsigned long reg0 asm ("0") = AP_MKQID(0, 0);
- register unsigned long reg1 asm ("1") = -ENODEV;
- register unsigned long reg2 asm ("2") = 0UL;
-
- asm volatile(
- " .long 0xb2af0000\n" /* PQAP(TAPQ) */
- "0: la %1,0\n"
- "1:\n"
- EX_TABLE(0b, 1b)
- : "+d" (reg0), "+d" (reg1), "+d" (reg2) : : "cc");
- return reg1;
-}
-
-/**
- * ap_tapq(): Test adjunct processor queue.
- * @qid: The AP queue number
- * @info: Pointer to queue descriptor
- *
- * Returns AP queue status structure.
- */
-static inline struct ap_queue_status ap_tapq(ap_qid_t qid, unsigned long *info)
-{
- register unsigned long reg0 asm ("0") = qid;
- register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm ("2") = 0UL;
-
- asm volatile(".long 0xb2af0000" /* PQAP(TAPQ) */
- : "+d" (reg0), "=d" (reg1), "+d" (reg2) : : "cc");
- if (info)
- *info = reg2;
- return reg1;
-}
-
-/**
- * ap_pqap_rapq(): Reset adjunct processor queue.
- * @qid: The AP queue number
- *
- * Returns AP queue status structure.
- */
-static inline struct ap_queue_status ap_rapq(ap_qid_t qid)
-{
- register unsigned long reg0 asm ("0") = qid | 0x01000000UL;
- register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm ("2") = 0UL;
-
- asm volatile(
- ".long 0xb2af0000" /* PQAP(RAPQ) */
- : "+d" (reg0), "=d" (reg1), "+d" (reg2) : : "cc");
- return reg1;
-}
-
-/**
- * ap_aqic(): Control interruption for a specific AP.
- * @qid: The AP queue number
- * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
- * @ind: The notification indicator byte
- *
- * Returns AP queue status.
- */
-static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
- struct ap_qirq_ctrl qirqctrl,
- void *ind)
-{
- register unsigned long reg0 asm ("0") = qid | (3UL << 24);
- register struct ap_qirq_ctrl reg1_in asm ("1") = qirqctrl;
- register struct ap_queue_status reg1_out asm ("1");
- register void *reg2 asm ("2") = ind;
-
- asm volatile(
- ".long 0xb2af0000" /* PQAP(AQIC) */
- : "+d" (reg0), "+d" (reg1_in), "=d" (reg1_out), "+d" (reg2)
- :
- : "cc");
- return reg1_out;
-}
-
-/**
- * ap_qci(): Get AP configuration data
- *
- * Returns 0 on success, or -EOPNOTSUPP.
- */
-static inline int ap_qci(void *config)
-{
- register unsigned long reg0 asm ("0") = 0x04000000UL;
- register unsigned long reg1 asm ("1") = -EINVAL;
- register void *reg2 asm ("2") = (void *) config;
-
- asm volatile(
- ".long 0xb2af0000\n" /* PQAP(QCI) */
- "0: la %1,0\n"
- "1:\n"
- EX_TABLE(0b, 1b)
- : "+d" (reg0), "+d" (reg1), "+d" (reg2)
- :
- : "cc", "memory");
-
- return reg1;
-}
-
-/*
- * union ap_qact_ap_info - used together with the
- * ap_aqic() function to provide a convenient way
- * to handle the ap info needed by the qact function.
- */
-union ap_qact_ap_info {
- unsigned long val;
- struct {
- unsigned int : 3;
- unsigned int mode : 3;
- unsigned int : 26;
- unsigned int cat : 8;
- unsigned int : 8;
- unsigned char ver[2];
- };
-};
-
-/**
- * ap_qact(): Query AP combatibility type.
- * @qid: The AP queue number
- * @apinfo: On input the info about the AP queue. On output the
- * alternate AP queue info provided by the qact function
- * in GR2 is stored in.
- *
- * Returns AP queue status. Check response_code field for failures.
- */
-static inline struct ap_queue_status ap_qact(ap_qid_t qid, int ifbit,
- union ap_qact_ap_info *apinfo)
-{
- register unsigned long reg0 asm ("0") = qid | (5UL << 24)
- | ((ifbit & 0x01) << 22);
- register unsigned long reg1_in asm ("1") = apinfo->val;
- register struct ap_queue_status reg1_out asm ("1");
- register unsigned long reg2 asm ("2") = 0;
-
- asm volatile(
- ".long 0xb2af0000" /* PQAP(QACT) */
- : "+d" (reg0), "+d" (reg1_in), "=d" (reg1_out), "+d" (reg2)
- : : "cc");
- apinfo->val = reg2;
- return reg1_out;
-}
-
-/**
- * ap_nqap(): Send message to adjunct processor queue.
- * @qid: The AP queue number
- * @psmid: The program supplied message identifier
- * @msg: The message text
- * @length: The message length
- *
- * Returns AP queue status structure.
- * Condition code 1 on NQAP can't happen because the L bit is 1.
- * Condition code 2 on NQAP also means the send is incomplete,
- * because a segment boundary was reached. The NQAP is repeated.
- */
-static inline struct ap_queue_status ap_nqap(ap_qid_t qid,
- unsigned long long psmid,
- void *msg, size_t length)
-{
- register unsigned long reg0 asm ("0") = qid | 0x40000000UL;
- register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm ("2") = (unsigned long) msg;
- register unsigned long reg3 asm ("3") = (unsigned long) length;
- register unsigned long reg4 asm ("4") = (unsigned int) (psmid >> 32);
- register unsigned long reg5 asm ("5") = psmid & 0xffffffff;
-
- asm volatile (
- "0: .long 0xb2ad0042\n" /* NQAP */
- " brc 2,0b"
- : "+d" (reg0), "=d" (reg1), "+d" (reg2), "+d" (reg3)
- : "d" (reg4), "d" (reg5)
- : "cc", "memory");
- return reg1;
-}
-
-/**
- * ap_dqap(): Receive message from adjunct processor queue.
- * @qid: The AP queue number
- * @psmid: Pointer to program supplied message identifier
- * @msg: The message text
- * @length: The message length
- *
- * Returns AP queue status structure.
- * Condition code 1 on DQAP means the receive has taken place
- * but only partially. The response is incomplete, hence the
- * DQAP is repeated.
- * Condition code 2 on DQAP also means the receive is incomplete,
- * this time because a segment boundary was reached. Again, the
- * DQAP is repeated.
- * Note that gpr2 is used by the DQAP instruction to keep track of
- * any 'residual' length, in case the instruction gets interrupted.
- * Hence it gets zeroed before the instruction.
- */
-static inline struct ap_queue_status ap_dqap(ap_qid_t qid,
- unsigned long long *psmid,
- void *msg, size_t length)
-{
- register unsigned long reg0 asm("0") = qid | 0x80000000UL;
- register struct ap_queue_status reg1 asm ("1");
- register unsigned long reg2 asm("2") = 0UL;
- register unsigned long reg4 asm("4") = (unsigned long) msg;
- register unsigned long reg5 asm("5") = (unsigned long) length;
- register unsigned long reg6 asm("6") = 0UL;
- register unsigned long reg7 asm("7") = 0UL;
-
-
- asm volatile(
- "0: .long 0xb2ae0064\n" /* DQAP */
- " brc 6,0b\n"
- : "+d" (reg0), "=d" (reg1), "+d" (reg2),
- "+d" (reg4), "+d" (reg5), "+d" (reg6), "+d" (reg7)
- : : "cc", "memory");
- *psmid = (((unsigned long long) reg6) << 32) + reg7;
- return reg1;
-}
-
-#endif /* _AP_ASM_H_ */
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 35a0c2b..bf27fc4 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -36,7 +36,6 @@
#include <linux/debugfs.h>
#include "ap_bus.h"
-#include "ap_asm.h"
#include "ap_debug.h"
/*
@@ -174,24 +173,6 @@
return 0;
}
-/**
- * ap_test_queue(): Test adjunct processor queue.
- * @qid: The AP queue number
- * @tbit: Test facilities bit
- * @info: Pointer to queue descriptor
- *
- * Returns AP queue status structure.
- */
-struct ap_queue_status ap_test_queue(ap_qid_t qid,
- int tbit,
- unsigned long *info)
-{
- if (tbit)
- qid |= 1UL << 23; /* set T bit*/
- return ap_tapq(qid, info);
-}
-EXPORT_SYMBOL(ap_test_queue);
-
/*
* ap_query_configuration(): Fetch cryptographic config info
*
@@ -200,7 +181,7 @@
* is returned, e.g. if the PQAP(QCI) instruction is not
* available, the return value will be -EOPNOTSUPP.
*/
-int ap_query_configuration(struct ap_config_info *info)
+static inline int ap_query_configuration(struct ap_config_info *info)
{
if (!ap_configuration_available())
return -EOPNOTSUPP;
@@ -493,7 +474,7 @@
return 0;
mutex_lock(&ap_poll_thread_mutex);
ap_poll_kthread = kthread_run(ap_poll_thread, NULL, "appoll");
- rc = PTR_RET(ap_poll_kthread);
+ rc = PTR_ERR_OR_ZERO(ap_poll_kthread);
if (rc)
ap_poll_kthread = NULL;
mutex_unlock(&ap_poll_thread_mutex);
@@ -1261,7 +1242,7 @@
/* Create /sys/devices/ap. */
ap_root_device = root_device_register("ap");
- rc = PTR_RET(ap_root_device);
+ rc = PTR_ERR_OR_ZERO(ap_root_device);
if (rc)
goto out_bus;
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 6a273c5..9365419 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -15,6 +15,7 @@
#include <linux/device.h>
#include <linux/types.h>
+#include <asm/isc.h>
#include <asm/ap.h>
#define AP_DEVICES 256 /* Number of AP devices. */
diff --git a/drivers/s390/crypto/ap_card.c b/drivers/s390/crypto/ap_card.c
index 2c726df..c13e432 100644
--- a/drivers/s390/crypto/ap_card.c
+++ b/drivers/s390/crypto/ap_card.c
@@ -14,7 +14,6 @@
#include <asm/facility.h>
#include "ap_bus.h"
-#include "ap_asm.h"
/*
* AP card related attributes.
diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c
index ba3a2e1..e365171 100644
--- a/drivers/s390/crypto/ap_queue.c
+++ b/drivers/s390/crypto/ap_queue.c
@@ -14,26 +14,6 @@
#include <asm/facility.h>
#include "ap_bus.h"
-#include "ap_asm.h"
-
-/**
- * ap_queue_irq_ctrl(): Control interruption on a AP queue.
- * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
- * @ind: The notification indicator byte
- *
- * Returns AP queue status.
- *
- * Control interruption on the given AP queue.
- * Just a simple wrapper function for the low level PQAP(AQIC)
- * instruction available for other kernel modules.
- */
-struct ap_queue_status ap_queue_irq_ctrl(ap_qid_t qid,
- struct ap_qirq_ctrl qirqctrl,
- void *ind)
-{
- return ap_aqic(qid, qirqctrl, ind);
-}
-EXPORT_SYMBOL(ap_queue_irq_ctrl);
/**
* ap_queue_enable_interruption(): Enable interruption on an AP queue.
diff --git a/drivers/s390/crypto/pkey_api.c b/drivers/s390/crypto/pkey_api.c
index 3929c8b..e663432 100644
--- a/drivers/s390/crypto/pkey_api.c
+++ b/drivers/s390/crypto/pkey_api.c
@@ -699,7 +699,7 @@
/* fill request cprb param block with FQ request */
preqparm = (struct fqreqparm *) preqcblk->req_parmb;
memcpy(preqparm->subfunc_code, "FQ", 2);
- strncpy(preqparm->rule_array, keyword, sizeof(preqparm->rule_array));
+ memcpy(preqparm->rule_array, keyword, sizeof(preqparm->rule_array));
preqparm->rule_array_len =
sizeof(preqparm->rule_array_len) + sizeof(preqparm->rule_array);
preqparm->lv1.len = sizeof(preqparm->lv1);
diff --git a/drivers/s390/crypto/zcrypt_card.c b/drivers/s390/crypto/zcrypt_card.c
index 233e1e6..da2c8df 100644
--- a/drivers/s390/crypto/zcrypt_card.c
+++ b/drivers/s390/crypto/zcrypt_card.c
@@ -83,9 +83,21 @@
static DEVICE_ATTR(online, 0644, zcrypt_card_online_show,
zcrypt_card_online_store);
+static ssize_t zcrypt_card_load_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct zcrypt_card *zc = to_ap_card(dev)->private;
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&zc->load));
+}
+
+static DEVICE_ATTR(load, 0444, zcrypt_card_load_show, NULL);
+
static struct attribute *zcrypt_card_attrs[] = {
&dev_attr_type.attr,
&dev_attr_online.attr,
+ &dev_attr_load.attr,
NULL,
};
diff --git a/drivers/s390/crypto/zcrypt_cca_key.h b/drivers/s390/crypto/zcrypt_cca_key.h
index 011d61d..1752622 100644
--- a/drivers/s390/crypto/zcrypt_cca_key.h
+++ b/drivers/s390/crypto/zcrypt_cca_key.h
@@ -99,7 +99,7 @@
* @mex: pointer to user input data
* @p: pointer to memory area for the key
*
- * Returns the size of the key area or -EFAULT
+ * Returns the size of the key area or negative errno value.
*/
static inline int zcrypt_type6_mex_key_en(struct ica_rsa_modexpo *mex, void *p)
{
@@ -118,6 +118,15 @@
unsigned char *temp;
int i;
+ /*
+ * The inputdatalength was a selection criteria in the dispatching
+ * function zcrypt_rsa_modexpo(). However, do a plausibility check
+ * here to make sure the following copy_from_user() can't be utilized
+ * to compromise the system.
+ */
+ if (WARN_ON_ONCE(mex->inputdatalength > 512))
+ return -EINVAL;
+
memset(key, 0, sizeof(*key));
key->pubHdr = static_pub_hdr;
@@ -178,6 +187,15 @@
struct cca_public_sec *pub;
int short_len, long_len, pad_len, key_len, size;
+ /*
+ * The inputdatalength was a selection criteria in the dispatching
+ * function zcrypt_rsa_crt(). However, do a plausibility check
+ * here to make sure the following copy_from_user() can't be utilized
+ * to compromise the system.
+ */
+ if (WARN_ON_ONCE(crt->inputdatalength > 512))
+ return -EINVAL;
+
memset(key, 0, sizeof(*key));
short_len = (crt->inputdatalength + 1) / 2;
diff --git a/drivers/s390/crypto/zcrypt_msgtype6.c b/drivers/s390/crypto/zcrypt_msgtype6.c
index 97d4bac..e70ae07 100644
--- a/drivers/s390/crypto/zcrypt_msgtype6.c
+++ b/drivers/s390/crypto/zcrypt_msgtype6.c
@@ -246,7 +246,7 @@
* @ap_msg: pointer to AP message
* @mex: pointer to user input data
*
- * Returns 0 on success or -EFAULT.
+ * Returns 0 on success or negative errno value.
*/
static int ICAMEX_msg_to_type6MEX_msgX(struct zcrypt_queue *zq,
struct ap_message *ap_msg,
@@ -272,6 +272,14 @@
} __packed * msg = ap_msg->message;
int size;
+ /*
+ * The inputdatalength was a selection criteria in the dispatching
+ * function zcrypt_rsa_modexpo(). However, make sure the following
+ * copy_from_user() never exceeds the allocated buffer space.
+ */
+ if (WARN_ON_ONCE(mex->inputdatalength > PAGE_SIZE))
+ return -EINVAL;
+
/* VUD.ciphertext */
msg->length = mex->inputdatalength + 2;
if (copy_from_user(msg->text, mex->inputdata, mex->inputdatalength))
@@ -307,7 +315,7 @@
* @ap_msg: pointer to AP message
* @crt: pointer to user input data
*
- * Returns 0 on success or -EFAULT.
+ * Returns 0 on success or negative errno value.
*/
static int ICACRT_msg_to_type6CRT_msgX(struct zcrypt_queue *zq,
struct ap_message *ap_msg,
@@ -334,6 +342,14 @@
} __packed * msg = ap_msg->message;
int size;
+ /*
+ * The inputdatalength was a selection criteria in the dispatching
+ * function zcrypt_rsa_crt(). However, make sure the following
+ * copy_from_user() never exceeds the allocated buffer space.
+ */
+ if (WARN_ON_ONCE(crt->inputdatalength > PAGE_SIZE))
+ return -EINVAL;
+
/* VUD.ciphertext */
msg->length = crt->inputdatalength + 2;
if (copy_from_user(msg->text, crt->inputdata, crt->inputdatalength))
diff --git a/drivers/s390/crypto/zcrypt_queue.c b/drivers/s390/crypto/zcrypt_queue.c
index 720434e..91a52f2 100644
--- a/drivers/s390/crypto/zcrypt_queue.c
+++ b/drivers/s390/crypto/zcrypt_queue.c
@@ -75,8 +75,20 @@
static DEVICE_ATTR(online, 0644, zcrypt_queue_online_show,
zcrypt_queue_online_store);
+static ssize_t zcrypt_queue_load_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct zcrypt_queue *zq = to_ap_queue(dev)->private;
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&zq->load));
+}
+
+static DEVICE_ATTR(load, 0444, zcrypt_queue_load_show, NULL);
+
static struct attribute *zcrypt_queue_attrs[] = {
&dev_attr_online.attr,
+ &dev_attr_load.attr,
NULL,
};
diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
index a3a8c8d..94f4d8f 100644
--- a/drivers/s390/scsi/zfcp_aux.c
+++ b/drivers/s390/scsi/zfcp_aux.c
@@ -101,7 +101,7 @@
token = strsep(&str, ",");
if (!token || strlen(token) >= ZFCP_BUS_ID_SIZE)
goto err_out;
- strncpy(busid, token, ZFCP_BUS_ID_SIZE);
+ strlcpy(busid, token, ZFCP_BUS_ID_SIZE);
token = strsep(&str, ",");
if (!token || kstrtoull(token, 0, (unsigned long long *) &wwpn))
diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
index 46a4484..c6f78d2 100644
--- a/drivers/video/fbdev/efifb.c
+++ b/drivers/video/fbdev/efifb.c
@@ -20,7 +20,7 @@
#include <drm/drm_connector.h> /* For DRM_MODE_PANEL_ORIENTATION_* */
static bool request_mem_succeeded = false;
-static bool nowc = false;
+static u64 mem_flags = EFI_MEMORY_WC | EFI_MEMORY_UC;
static struct fb_var_screeninfo efifb_defined = {
.activate = FB_ACTIVATE_NOW,
@@ -68,8 +68,12 @@
static void efifb_destroy(struct fb_info *info)
{
- if (info->screen_base)
- iounmap(info->screen_base);
+ if (info->screen_base) {
+ if (mem_flags & (EFI_MEMORY_UC | EFI_MEMORY_WC))
+ iounmap(info->screen_base);
+ else
+ memunmap(info->screen_base);
+ }
if (request_mem_succeeded)
release_mem_region(info->apertures->ranges[0].base,
info->apertures->ranges[0].size);
@@ -104,7 +108,7 @@
else if (!strncmp(this_opt, "width:", 6))
screen_info.lfb_width = simple_strtoul(this_opt+6, NULL, 0);
else if (!strcmp(this_opt, "nowc"))
- nowc = true;
+ mem_flags &= ~EFI_MEMORY_WC;
}
}
@@ -164,6 +168,7 @@
unsigned int size_remap;
unsigned int size_total;
char *option = NULL;
+ efi_memory_desc_t md;
if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI || pci_dev_disabled)
return -ENODEV;
@@ -272,12 +277,35 @@
info->apertures->ranges[0].base = efifb_fix.smem_start;
info->apertures->ranges[0].size = size_remap;
- if (nowc)
- info->screen_base = ioremap(efifb_fix.smem_start, efifb_fix.smem_len);
- else
- info->screen_base = ioremap_wc(efifb_fix.smem_start, efifb_fix.smem_len);
+ if (!efi_mem_desc_lookup(efifb_fix.smem_start, &md)) {
+ if ((efifb_fix.smem_start + efifb_fix.smem_len) >
+ (md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT))) {
+ pr_err("efifb: video memory @ 0x%lx spans multiple EFI memory regions\n",
+ efifb_fix.smem_start);
+ err = -EIO;
+ goto err_release_fb;
+ }
+ /*
+ * If the UEFI memory map covers the efifb region, we may only
+ * remap it using the attributes the memory map prescribes.
+ */
+ mem_flags |= EFI_MEMORY_WT | EFI_MEMORY_WB;
+ mem_flags &= md.attribute;
+ }
+ if (mem_flags & EFI_MEMORY_WC)
+ info->screen_base = ioremap_wc(efifb_fix.smem_start,
+ efifb_fix.smem_len);
+ else if (mem_flags & EFI_MEMORY_UC)
+ info->screen_base = ioremap(efifb_fix.smem_start,
+ efifb_fix.smem_len);
+ else if (mem_flags & EFI_MEMORY_WT)
+ info->screen_base = memremap(efifb_fix.smem_start,
+ efifb_fix.smem_len, MEMREMAP_WT);
+ else if (mem_flags & EFI_MEMORY_WB)
+ info->screen_base = memremap(efifb_fix.smem_start,
+ efifb_fix.smem_len, MEMREMAP_WB);
if (!info->screen_base) {
- pr_err("efifb: abort, cannot ioremap video memory 0x%x @ 0x%lx\n",
+ pr_err("efifb: abort, cannot remap video memory 0x%x @ 0x%lx\n",
efifb_fix.smem_len, efifb_fix.smem_start);
err = -EIO;
goto err_release_fb;
@@ -371,7 +399,10 @@
err_groups:
sysfs_remove_groups(&dev->dev.kobj, efifb_groups);
err_unmap:
- iounmap(info->screen_base);
+ if (mem_flags & (EFI_MEMORY_UC | EFI_MEMORY_WC))
+ iounmap(info->screen_base);
+ else
+ memunmap(info->screen_base);
err_release_fb:
framebuffer_release(info);
err_release_mem:
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index a1b1808..183cc54 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -648,7 +648,7 @@
trace_afs_notify_call(rxcall, call);
call->need_attention = true;
- u = __atomic_add_unless(&call->usage, 1, 0);
+ u = atomic_fetch_add_unless(&call->usage, 1, 0);
if (u != 0) {
trace_afs_call(call, afs_call_trace_wake, u,
atomic_read(&call->net->nr_outstanding_calls),
diff --git a/fs/efivarfs/inode.c b/fs/efivarfs/inode.c
index 71fcccc..8c6ab6c 100644
--- a/fs/efivarfs/inode.c
+++ b/fs/efivarfs/inode.c
@@ -86,7 +86,9 @@
/* length of the variable name itself: remove GUID and separator */
namelen = dentry->d_name.len - EFI_VARIABLE_GUID_LEN - 1;
- uuid_le_to_bin(dentry->d_name.name + namelen + 1, &var->var.VendorGuid);
+ err = guid_parse(dentry->d_name.name + namelen + 1, &var->var.VendorGuid);
+ if (err)
+ goto out;
if (efivar_variable_is_removable(var->var.VendorGuid,
dentry->d_name.name, namelen))
diff --git a/fs/timerfd.c b/fs/timerfd.c
index cdad49d..38c695c 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -533,8 +533,8 @@
}
SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
- const struct itimerspec __user *, utmr,
- struct itimerspec __user *, otmr)
+ const struct __kernel_itimerspec __user *, utmr,
+ struct __kernel_itimerspec __user *, otmr)
{
struct itimerspec64 new, old;
int ret;
@@ -550,7 +550,7 @@
return ret;
}
-SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
+SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct __kernel_itimerspec __user *, otmr)
{
struct itimerspec64 kotmr;
int ret = do_timerfd_gettime(ufd, &kotmr);
@@ -559,7 +559,7 @@
return put_itimerspec64(&kotmr, otmr) ? -EFAULT : 0;
}
-#ifdef CONFIG_COMPAT
+#ifdef CONFIG_COMPAT_32BIT_TIME
COMPAT_SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
const struct compat_itimerspec __user *, utmr,
struct compat_itimerspec __user *, otmr)
diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index ec07f23..0d4b1d3 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -84,42 +84,59 @@
}
#endif
-static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
+#ifdef arch_atomic_fetch_add_unless
+#define atomic_fetch_add_unless atomic_fetch_add_unless
+static __always_inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
{
kasan_check_write(v, sizeof(*v));
- return __arch_atomic_add_unless(v, a, u);
+ return arch_atomic_fetch_add_unless(v, a, u);
}
+#endif
-
-static __always_inline bool atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
+#ifdef arch_atomic64_fetch_add_unless
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
+static __always_inline s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
{
kasan_check_write(v, sizeof(*v));
- return arch_atomic64_add_unless(v, a, u);
+ return arch_atomic64_fetch_add_unless(v, a, u);
}
+#endif
+#ifdef arch_atomic_inc
+#define atomic_inc atomic_inc
static __always_inline void atomic_inc(atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
arch_atomic_inc(v);
}
+#endif
+#ifdef arch_atomic64_inc
+#define atomic64_inc atomic64_inc
static __always_inline void atomic64_inc(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
arch_atomic64_inc(v);
}
+#endif
+#ifdef arch_atomic_dec
+#define atomic_dec atomic_dec
static __always_inline void atomic_dec(atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
arch_atomic_dec(v);
}
+#endif
+#ifdef atch_atomic64_dec
+#define atomic64_dec
static __always_inline void atomic64_dec(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
arch_atomic64_dec(v);
}
+#endif
static __always_inline void atomic_add(int i, atomic_t *v)
{
@@ -181,65 +198,95 @@
arch_atomic64_xor(i, v);
}
+#ifdef arch_atomic_inc_return
+#define atomic_inc_return atomic_inc_return
static __always_inline int atomic_inc_return(atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic_inc_return(v);
}
+#endif
+#ifdef arch_atomic64_in_return
+#define atomic64_inc_return atomic64_inc_return
static __always_inline s64 atomic64_inc_return(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_inc_return(v);
}
+#endif
+#ifdef arch_atomic_dec_return
+#define atomic_dec_return atomic_dec_return
static __always_inline int atomic_dec_return(atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic_dec_return(v);
}
+#endif
+#ifdef arch_atomic64_dec_return
+#define atomic64_dec_return atomic64_dec_return
static __always_inline s64 atomic64_dec_return(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_dec_return(v);
}
+#endif
-static __always_inline s64 atomic64_inc_not_zero(atomic64_t *v)
+#ifdef arch_atomic64_inc_not_zero
+#define atomic64_inc_not_zero atomic64_inc_not_zero
+static __always_inline bool atomic64_inc_not_zero(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_inc_not_zero(v);
}
+#endif
+#ifdef arch_atomic64_dec_if_positive
+#define atomic64_dec_if_positive atomic64_dec_if_positive
static __always_inline s64 atomic64_dec_if_positive(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_dec_if_positive(v);
}
+#endif
+#ifdef arch_atomic_dec_and_test
+#define atomic_dec_and_test atomic_dec_and_test
static __always_inline bool atomic_dec_and_test(atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic_dec_and_test(v);
}
+#endif
+#ifdef arch_atomic64_dec_and_test
+#define atomic64_dec_and_test atomic64_dec_and_test
static __always_inline bool atomic64_dec_and_test(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_dec_and_test(v);
}
+#endif
+#ifdef arch_atomic_inc_and_test
+#define atomic_inc_and_test atomic_inc_and_test
static __always_inline bool atomic_inc_and_test(atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic_inc_and_test(v);
}
+#endif
+#ifdef arch_atomic64_inc_and_test
+#define atomic64_inc_and_test atomic64_inc_and_test
static __always_inline bool atomic64_inc_and_test(atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_inc_and_test(v);
}
+#endif
static __always_inline int atomic_add_return(int i, atomic_t *v)
{
@@ -325,152 +372,96 @@
return arch_atomic64_fetch_xor(i, v);
}
+#ifdef arch_atomic_sub_and_test
+#define atomic_sub_and_test atomic_sub_and_test
static __always_inline bool atomic_sub_and_test(int i, atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic_sub_and_test(i, v);
}
+#endif
+#ifdef arch_atomic64_sub_and_test
+#define atomic64_sub_and_test atomic64_sub_and_test
static __always_inline bool atomic64_sub_and_test(s64 i, atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_sub_and_test(i, v);
}
+#endif
+#ifdef arch_atomic_add_negative
+#define atomic_add_negative atomic_add_negative
static __always_inline bool atomic_add_negative(int i, atomic_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic_add_negative(i, v);
}
+#endif
+#ifdef arch_atomic64_add_negative
+#define atomic64_add_negative atomic64_add_negative
static __always_inline bool atomic64_add_negative(s64 i, atomic64_t *v)
{
kasan_check_write(v, sizeof(*v));
return arch_atomic64_add_negative(i, v);
}
+#endif
-static __always_inline unsigned long
-cmpxchg_size(volatile void *ptr, unsigned long old, unsigned long new, int size)
-{
- kasan_check_write(ptr, size);
- switch (size) {
- case 1:
- return arch_cmpxchg((u8 *)ptr, (u8)old, (u8)new);
- case 2:
- return arch_cmpxchg((u16 *)ptr, (u16)old, (u16)new);
- case 4:
- return arch_cmpxchg((u32 *)ptr, (u32)old, (u32)new);
- case 8:
- BUILD_BUG_ON(sizeof(unsigned long) != 8);
- return arch_cmpxchg((u64 *)ptr, (u64)old, (u64)new);
- }
- BUILD_BUG();
- return 0;
-}
+#define xchg(ptr, new) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ kasan_check_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg(__ai_ptr, (new)); \
+})
#define cmpxchg(ptr, old, new) \
({ \
- ((__typeof__(*(ptr)))cmpxchg_size((ptr), (unsigned long)(old), \
- (unsigned long)(new), sizeof(*(ptr)))); \
+ typeof(ptr) __ai_ptr = (ptr); \
+ kasan_check_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg(__ai_ptr, (old), (new)); \
})
-static __always_inline unsigned long
-sync_cmpxchg_size(volatile void *ptr, unsigned long old, unsigned long new,
- int size)
-{
- kasan_check_write(ptr, size);
- switch (size) {
- case 1:
- return arch_sync_cmpxchg((u8 *)ptr, (u8)old, (u8)new);
- case 2:
- return arch_sync_cmpxchg((u16 *)ptr, (u16)old, (u16)new);
- case 4:
- return arch_sync_cmpxchg((u32 *)ptr, (u32)old, (u32)new);
- case 8:
- BUILD_BUG_ON(sizeof(unsigned long) != 8);
- return arch_sync_cmpxchg((u64 *)ptr, (u64)old, (u64)new);
- }
- BUILD_BUG();
- return 0;
-}
-
#define sync_cmpxchg(ptr, old, new) \
({ \
- ((__typeof__(*(ptr)))sync_cmpxchg_size((ptr), \
- (unsigned long)(old), (unsigned long)(new), \
- sizeof(*(ptr)))); \
+ typeof(ptr) __ai_ptr = (ptr); \
+ kasan_check_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_sync_cmpxchg(__ai_ptr, (old), (new)); \
})
-static __always_inline unsigned long
-cmpxchg_local_size(volatile void *ptr, unsigned long old, unsigned long new,
- int size)
-{
- kasan_check_write(ptr, size);
- switch (size) {
- case 1:
- return arch_cmpxchg_local((u8 *)ptr, (u8)old, (u8)new);
- case 2:
- return arch_cmpxchg_local((u16 *)ptr, (u16)old, (u16)new);
- case 4:
- return arch_cmpxchg_local((u32 *)ptr, (u32)old, (u32)new);
- case 8:
- BUILD_BUG_ON(sizeof(unsigned long) != 8);
- return arch_cmpxchg_local((u64 *)ptr, (u64)old, (u64)new);
- }
- BUILD_BUG();
- return 0;
-}
-
#define cmpxchg_local(ptr, old, new) \
({ \
- ((__typeof__(*(ptr)))cmpxchg_local_size((ptr), \
- (unsigned long)(old), (unsigned long)(new), \
- sizeof(*(ptr)))); \
+ typeof(ptr) __ai_ptr = (ptr); \
+ kasan_check_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_local(__ai_ptr, (old), (new)); \
})
-static __always_inline u64
-cmpxchg64_size(volatile u64 *ptr, u64 old, u64 new)
-{
- kasan_check_write(ptr, sizeof(*ptr));
- return arch_cmpxchg64(ptr, old, new);
-}
-
#define cmpxchg64(ptr, old, new) \
({ \
- ((__typeof__(*(ptr)))cmpxchg64_size((ptr), (u64)(old), \
- (u64)(new))); \
+ typeof(ptr) __ai_ptr = (ptr); \
+ kasan_check_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64(__ai_ptr, (old), (new)); \
})
-static __always_inline u64
-cmpxchg64_local_size(volatile u64 *ptr, u64 old, u64 new)
-{
- kasan_check_write(ptr, sizeof(*ptr));
- return arch_cmpxchg64_local(ptr, old, new);
-}
-
#define cmpxchg64_local(ptr, old, new) \
({ \
- ((__typeof__(*(ptr)))cmpxchg64_local_size((ptr), (u64)(old), \
- (u64)(new))); \
+ typeof(ptr) __ai_ptr = (ptr); \
+ kasan_check_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_local(__ai_ptr, (old), (new)); \
})
-/*
- * Originally we had the following code here:
- * __typeof__(p1) ____p1 = (p1);
- * kasan_check_write(____p1, 2 * sizeof(*____p1));
- * arch_cmpxchg_double(____p1, (p2), (o1), (o2), (n1), (n2));
- * But it leads to compilation failures (see gcc issue 72873).
- * So for now it's left non-instrumented.
- * There are few callers of cmpxchg_double(), so it's not critical.
- */
#define cmpxchg_double(p1, p2, o1, o2, n1, n2) \
({ \
- arch_cmpxchg_double((p1), (p2), (o1), (o2), (n1), (n2)); \
+ typeof(p1) __ai_p1 = (p1); \
+ kasan_check_write(__ai_p1, 2 * sizeof(*__ai_p1)); \
+ arch_cmpxchg_double(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \
})
-#define cmpxchg_double_local(p1, p2, o1, o2, n1, n2) \
-({ \
- arch_cmpxchg_double_local((p1), (p2), (o1), (o2), (n1), (n2)); \
+#define cmpxchg_double_local(p1, p2, o1, o2, n1, n2) \
+({ \
+ typeof(p1) __ai_p1 = (p1); \
+ kasan_check_write(__ai_p1, 2 * sizeof(*__ai_p1)); \
+ arch_cmpxchg_double_local(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \
})
#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
index abe6dd9..13324aa 100644
--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -186,11 +186,6 @@
#include <linux/irqflags.h>
-static inline int atomic_add_negative(int i, atomic_t *v)
-{
- return atomic_add_return(i, v) < 0;
-}
-
static inline void atomic_add(int i, atomic_t *v)
{
atomic_add_return(i, v);
@@ -201,35 +196,7 @@
atomic_sub_return(i, v);
}
-static inline void atomic_inc(atomic_t *v)
-{
- atomic_add_return(1, v);
-}
-
-static inline void atomic_dec(atomic_t *v)
-{
- atomic_sub_return(1, v);
-}
-
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-
-#define atomic_sub_and_test(i, v) (atomic_sub_return((i), (v)) == 0)
-#define atomic_dec_and_test(v) (atomic_dec_return(v) == 0)
-#define atomic_inc_and_test(v) (atomic_inc_return(v) == 0)
-
#define atomic_xchg(ptr, v) (xchg(&(ptr)->counter, (v)))
#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-#ifndef __atomic_add_unless
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- while (c != u && (old = atomic_cmpxchg(v, c, c + a)) != c)
- c = old;
- return c;
-}
-#endif
-
#endif /* __ASM_GENERIC_ATOMIC_H */
diff --git a/include/asm-generic/atomic64.h b/include/asm-generic/atomic64.h
index 8d28eb0..97b28b7 100644
--- a/include/asm-generic/atomic64.h
+++ b/include/asm-generic/atomic64.h
@@ -11,6 +11,7 @@
*/
#ifndef _ASM_GENERIC_ATOMIC64_H
#define _ASM_GENERIC_ATOMIC64_H
+#include <linux/types.h>
typedef struct {
long long counter;
@@ -50,18 +51,10 @@
#undef ATOMIC64_OP
extern long long atomic64_dec_if_positive(atomic64_t *v);
+#define atomic64_dec_if_positive atomic64_dec_if_positive
extern long long atomic64_cmpxchg(atomic64_t *v, long long o, long long n);
extern long long atomic64_xchg(atomic64_t *v, long long new);
-extern int atomic64_add_unless(atomic64_t *v, long long a, long long u);
-
-#define atomic64_add_negative(a, v) (atomic64_add_return((a), (v)) < 0)
-#define atomic64_inc(v) atomic64_add(1LL, (v))
-#define atomic64_inc_return(v) atomic64_add_return(1LL, (v))
-#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
-#define atomic64_sub_and_test(a, v) (atomic64_sub_return((a), (v)) == 0)
-#define atomic64_dec(v) atomic64_sub(1LL, (v))
-#define atomic64_dec_return(v) atomic64_sub_return(1LL, (v))
-#define atomic64_dec_and_test(v) (atomic64_dec_return((v)) == 0)
-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1LL, 0LL)
+extern long long atomic64_fetch_add_unless(atomic64_t *v, long long a, long long u);
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
#endif /* _ASM_GENERIC_ATOMIC64_H */
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index 04deffa..dd90c97 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -2,189 +2,67 @@
#ifndef _ASM_GENERIC_BITOPS_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_ATOMIC_H_
-#include <asm/types.h>
-#include <linux/irqflags.h>
-
-#ifdef CONFIG_SMP
-#include <asm/spinlock.h>
-#include <asm/cache.h> /* we use L1_CACHE_BYTES */
-
-/* Use an array of spinlocks for our atomic_ts.
- * Hash function to index into a different SPINLOCK.
- * Since "a" is usually an address, use one spinlock per cacheline.
- */
-# define ATOMIC_HASH_SIZE 4
-# define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
-
-extern arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned;
-
-/* Can't use raw_spin_lock_irq because of #include problems, so
- * this is the substitute */
-#define _atomic_spin_lock_irqsave(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- local_irq_save(f); \
- arch_spin_lock(s); \
-} while(0)
-
-#define _atomic_spin_unlock_irqrestore(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- arch_spin_unlock(s); \
- local_irq_restore(f); \
-} while(0)
-
-
-#else
-# define _atomic_spin_lock_irqsave(l,f) do { local_irq_save(f); } while (0)
-# define _atomic_spin_unlock_irqrestore(l,f) do { local_irq_restore(f); } while (0)
-#endif
+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <asm/barrier.h>
/*
- * NMI events can occur at any time, including when interrupts have been
- * disabled by *_irqsave(). So you can get NMI events occurring while a
- * *_bit function is holding a spin lock. If the NMI handler also wants
- * to do bit manipulation (and they do) then you can get a deadlock
- * between the original caller of *_bit() and the NMI handler.
- *
- * by Keith Owens
+ * Implementation of atomic bitops using atomic-fetch ops.
+ * See Documentation/atomic_bitops.txt for details.
*/
-/**
- * set_bit - Atomically set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * This function is atomic and may not be reordered. See __set_bit()
- * if you do not require the atomic guarantees.
- *
- * Note: there are no guarantees that this function will not be reordered
- * on non x86 architectures, so if you are writing portable code,
- * make sure not to rely on its reordering guarantees.
- *
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void set_bit(int nr, volatile unsigned long *addr)
+static inline void set_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p |= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_or(BIT_MASK(nr), (atomic_long_t *)p);
}
-/**
- * clear_bit - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered. However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
- * in order to ensure changes are visible on other processors.
- */
-static inline void clear_bit(int nr, volatile unsigned long *addr)
+static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p &= ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_andnot(BIT_MASK(nr), (atomic_long_t *)p);
}
-/**
- * change_bit - Toggle a bit in memory
- * @nr: Bit to change
- * @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered. It may be
- * reordered on other architectures than x86.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void change_bit(int nr, volatile unsigned long *addr)
+static inline void change_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p ^= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_xor(BIT_MASK(nr), (atomic_long_t *)p);
}
-/**
- * test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It may be reordered on other architectures than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old | mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;
- return (old & mask) != 0;
+ old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It can be reorderdered on other architectures other than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old & ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (!(READ_ONCE(*p) & mask))
+ return 0;
- return (old & mask) != 0;
+ old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
-/**
- * test_and_change_bit - Change a bit and return its old value
- * @nr: Bit to change
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
- */
-static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_change_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old ^ mask;
- _atomic_spin_unlock_irqrestore(p, flags);
-
- return (old & mask) != 0;
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_xor(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}
#endif /* _ASM_GENERIC_BITOPS_ATOMIC_H */
diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index 67ab280..3ae0213 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -2,6 +2,10 @@
#ifndef _ASM_GENERIC_BITOPS_LOCK_H_
#define _ASM_GENERIC_BITOPS_LOCK_H_
+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <asm/barrier.h>
+
/**
* test_and_set_bit_lock - Set a bit and return its old value, for lock
* @nr: Bit to set
@@ -11,7 +15,20 @@
* the returned value is 0.
* It can be used to implement bit locks.
*/
-#define test_and_set_bit_lock(nr, addr) test_and_set_bit(nr, addr)
+static inline int test_and_set_bit_lock(unsigned int nr,
+ volatile unsigned long *p)
+{
+ long old;
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;
+
+ old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
+ return !!(old & mask);
+}
+
/**
* clear_bit_unlock - Clear a bit in memory, for unlock
@@ -20,11 +37,11 @@
*
* This operation is atomic and provides release barrier semantics.
*/
-#define clear_bit_unlock(nr, addr) \
-do { \
- smp_mb__before_atomic(); \
- clear_bit(nr, addr); \
-} while (0)
+static inline void clear_bit_unlock(unsigned int nr, volatile unsigned long *p)
+{
+ p += BIT_WORD(nr);
+ atomic_long_fetch_andnot_release(BIT_MASK(nr), (atomic_long_t *)p);
+}
/**
* __clear_bit_unlock - Clear a bit in memory, for unlock
@@ -37,11 +54,38 @@
*
* See for example x86's implementation.
*/
-#define __clear_bit_unlock(nr, addr) \
-do { \
- smp_mb__before_atomic(); \
- clear_bit(nr, addr); \
-} while (0)
+static inline void __clear_bit_unlock(unsigned int nr,
+ volatile unsigned long *p)
+{
+ unsigned long old;
+
+ p += BIT_WORD(nr);
+ old = READ_ONCE(*p);
+ old &= ~BIT_MASK(nr);
+ atomic_long_set_release((atomic_long_t *)p, old);
+}
+
+/**
+ * clear_bit_unlock_is_negative_byte - Clear a bit in memory and test if bottom
+ * byte is negative, for unlock.
+ * @nr: the bit to clear
+ * @addr: the address to start counting from
+ *
+ * This is a bit of a one-trick-pony for the filemap code, which clears
+ * PG_locked and tests PG_waiters,
+ */
+#ifndef clear_bit_unlock_is_negative_byte
+static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
+ volatile unsigned long *p)
+{
+ long old;
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_andnot_release(mask, (atomic_long_t *)p);
+ return !!(old & BIT(7));
+}
+#define clear_bit_unlock_is_negative_byte clear_bit_unlock_is_negative_byte
+#endif
#endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
-
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639a..b081794 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
int pud_clear_huge(pud_t *pud);
int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
{
@@ -1046,11 +1046,11 @@
{
return 0;
}
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return 0;
}
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return 0;
}
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 3063125..e811ef7 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -303,4 +303,14 @@
#define tlb_migrate_finish(mm) do {} while (0)
+/*
+ * Used to flush the TLB when page tables are removed, when lazy
+ * TLB mode may cause a CPU to retain intermediate translations
+ * pointing to about-to-be-freed page table memory.
+ */
+#ifndef HAVE_TLB_FLUSH_REMOVE_TABLES
+#define tlb_flush_remove_tables(mm) do {} while (0)
+#define tlb_flush_remove_tables_local(mm) do {} while (0)
+#endif
+
#endif /* _ASM_GENERIC__TLB_H */
diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 01ce399..1e8e88b 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -2,6 +2,8 @@
/* Atomic operations usable in machine independent code */
#ifndef _LINUX_ATOMIC_H
#define _LINUX_ATOMIC_H
+#include <linux/types.h>
+
#include <asm/atomic.h>
#include <asm/barrier.h>
@@ -36,40 +38,46 @@
* barriers on top of the relaxed variant. In the case where the relaxed
* variant is already fully ordered, no additional barriers are needed.
*
- * Besides, if an arch has a special barrier for acquire/release, it could
- * implement its own __atomic_op_* and use the same framework for building
- * variants
- *
- * If an architecture overrides __atomic_op_acquire() it will probably want
- * to define smp_mb__after_spinlock().
+ * If an architecture overrides __atomic_acquire_fence() it will probably
+ * want to define smp_mb__after_spinlock().
*/
-#ifndef __atomic_op_acquire
+#ifndef __atomic_acquire_fence
+#define __atomic_acquire_fence smp_mb__after_atomic
+#endif
+
+#ifndef __atomic_release_fence
+#define __atomic_release_fence smp_mb__before_atomic
+#endif
+
+#ifndef __atomic_pre_full_fence
+#define __atomic_pre_full_fence smp_mb__before_atomic
+#endif
+
+#ifndef __atomic_post_full_fence
+#define __atomic_post_full_fence smp_mb__after_atomic
+#endif
+
#define __atomic_op_acquire(op, args...) \
({ \
typeof(op##_relaxed(args)) __ret = op##_relaxed(args); \
- smp_mb__after_atomic(); \
+ __atomic_acquire_fence(); \
__ret; \
})
-#endif
-#ifndef __atomic_op_release
#define __atomic_op_release(op, args...) \
({ \
- smp_mb__before_atomic(); \
+ __atomic_release_fence(); \
op##_relaxed(args); \
})
-#endif
-#ifndef __atomic_op_fence
#define __atomic_op_fence(op, args...) \
({ \
typeof(op##_relaxed(args)) __ret; \
- smp_mb__before_atomic(); \
+ __atomic_pre_full_fence(); \
__ret = op##_relaxed(args); \
- smp_mb__after_atomic(); \
+ __atomic_post_full_fence(); \
__ret; \
})
-#endif
/* atomic_add_return_relaxed */
#ifndef atomic_add_return_relaxed
@@ -95,11 +103,23 @@
#endif
#endif /* atomic_add_return_relaxed */
+#ifndef atomic_inc
+#define atomic_inc(v) atomic_add(1, (v))
+#endif
+
/* atomic_inc_return_relaxed */
#ifndef atomic_inc_return_relaxed
+
+#ifndef atomic_inc_return
+#define atomic_inc_return(v) atomic_add_return(1, (v))
+#define atomic_inc_return_relaxed(v) atomic_add_return_relaxed(1, (v))
+#define atomic_inc_return_acquire(v) atomic_add_return_acquire(1, (v))
+#define atomic_inc_return_release(v) atomic_add_return_release(1, (v))
+#else /* atomic_inc_return */
#define atomic_inc_return_relaxed atomic_inc_return
#define atomic_inc_return_acquire atomic_inc_return
#define atomic_inc_return_release atomic_inc_return
+#endif /* atomic_inc_return */
#else /* atomic_inc_return_relaxed */
@@ -143,11 +163,23 @@
#endif
#endif /* atomic_sub_return_relaxed */
+#ifndef atomic_dec
+#define atomic_dec(v) atomic_sub(1, (v))
+#endif
+
/* atomic_dec_return_relaxed */
#ifndef atomic_dec_return_relaxed
+
+#ifndef atomic_dec_return
+#define atomic_dec_return(v) atomic_sub_return(1, (v))
+#define atomic_dec_return_relaxed(v) atomic_sub_return_relaxed(1, (v))
+#define atomic_dec_return_acquire(v) atomic_sub_return_acquire(1, (v))
+#define atomic_dec_return_release(v) atomic_sub_return_release(1, (v))
+#else /* atomic_dec_return */
#define atomic_dec_return_relaxed atomic_dec_return
#define atomic_dec_return_acquire atomic_dec_return
#define atomic_dec_return_release atomic_dec_return
+#endif /* atomic_dec_return */
#else /* atomic_dec_return_relaxed */
@@ -328,12 +360,22 @@
#endif
#endif /* atomic_fetch_and_relaxed */
-#ifdef atomic_andnot
-/* atomic_fetch_andnot_relaxed */
+#ifndef atomic_andnot
+#define atomic_andnot(i, v) atomic_and(~(int)(i), (v))
+#endif
+
#ifndef atomic_fetch_andnot_relaxed
-#define atomic_fetch_andnot_relaxed atomic_fetch_andnot
-#define atomic_fetch_andnot_acquire atomic_fetch_andnot
-#define atomic_fetch_andnot_release atomic_fetch_andnot
+
+#ifndef atomic_fetch_andnot
+#define atomic_fetch_andnot(i, v) atomic_fetch_and(~(int)(i), (v))
+#define atomic_fetch_andnot_relaxed(i, v) atomic_fetch_and_relaxed(~(int)(i), (v))
+#define atomic_fetch_andnot_acquire(i, v) atomic_fetch_and_acquire(~(int)(i), (v))
+#define atomic_fetch_andnot_release(i, v) atomic_fetch_and_release(~(int)(i), (v))
+#else /* atomic_fetch_andnot */
+#define atomic_fetch_andnot_relaxed atomic_fetch_andnot
+#define atomic_fetch_andnot_acquire atomic_fetch_andnot
+#define atomic_fetch_andnot_release atomic_fetch_andnot
+#endif /* atomic_fetch_andnot */
#else /* atomic_fetch_andnot_relaxed */
@@ -352,7 +394,6 @@
__atomic_op_fence(atomic_fetch_andnot, __VA_ARGS__)
#endif
#endif /* atomic_fetch_andnot_relaxed */
-#endif /* atomic_andnot */
/* atomic_fetch_xor_relaxed */
#ifndef atomic_fetch_xor_relaxed
@@ -520,112 +561,140 @@
#endif /* xchg_relaxed */
/**
+ * atomic_fetch_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, if @v was not already @u.
+ * Returns the original value of @v.
+ */
+#ifndef atomic_fetch_add_unless
+static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
+{
+ int c = atomic_read(v);
+
+ do {
+ if (unlikely(c == u))
+ break;
+ } while (!atomic_try_cmpxchg(v, &c, c + a));
+
+ return c;
+}
+#endif
+
+/**
* atomic_add_unless - add unless the number is already a given value
* @v: pointer of type atomic_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
*
- * Atomically adds @a to @v, so long as @v was not already @u.
- * Returns non-zero if @v was not @u, and zero otherwise.
+ * Atomically adds @a to @v, if @v was not already @u.
+ * Returns true if the addition was done.
*/
-static inline int atomic_add_unless(atomic_t *v, int a, int u)
+static inline bool atomic_add_unless(atomic_t *v, int a, int u)
{
- return __atomic_add_unless(v, a, u) != u;
+ return atomic_fetch_add_unless(v, a, u) != u;
}
/**
* atomic_inc_not_zero - increment unless the number is zero
* @v: pointer of type atomic_t
*
- * Atomically increments @v by 1, so long as @v is non-zero.
- * Returns non-zero if @v was non-zero, and zero otherwise.
+ * Atomically increments @v by 1, if @v is non-zero.
+ * Returns true if the increment was done.
*/
#ifndef atomic_inc_not_zero
#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
#endif
-#ifndef atomic_andnot
-static inline void atomic_andnot(int i, atomic_t *v)
+/**
+ * atomic_inc_and_test - increment and test
+ * @v: pointer of type atomic_t
+ *
+ * Atomically increments @v by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+#ifndef atomic_inc_and_test
+static inline bool atomic_inc_and_test(atomic_t *v)
{
- atomic_and(~i, v);
-}
-
-static inline int atomic_fetch_andnot(int i, atomic_t *v)
-{
- return atomic_fetch_and(~i, v);
-}
-
-static inline int atomic_fetch_andnot_relaxed(int i, atomic_t *v)
-{
- return atomic_fetch_and_relaxed(~i, v);
-}
-
-static inline int atomic_fetch_andnot_acquire(int i, atomic_t *v)
-{
- return atomic_fetch_and_acquire(~i, v);
-}
-
-static inline int atomic_fetch_andnot_release(int i, atomic_t *v)
-{
- return atomic_fetch_and_release(~i, v);
+ return atomic_inc_return(v) == 0;
}
#endif
/**
- * atomic_inc_not_zero_hint - increment if not null
+ * atomic_dec_and_test - decrement and test
* @v: pointer of type atomic_t
- * @hint: probable value of the atomic before the increment
*
- * This version of atomic_inc_not_zero() gives a hint of probable
- * value of the atomic. This helps processor to not read the memory
- * before doing the atomic read/modify/write cycle, lowering
- * number of bus transactions on some arches.
- *
- * Returns: 0 if increment was not done, 1 otherwise.
+ * Atomically decrements @v by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
*/
-#ifndef atomic_inc_not_zero_hint
-static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
+#ifndef atomic_dec_and_test
+static inline bool atomic_dec_and_test(atomic_t *v)
{
- int val, c = hint;
+ return atomic_dec_return(v) == 0;
+}
+#endif
- /* sanity test, should be removed by compiler if hint is a constant */
- if (!hint)
- return atomic_inc_not_zero(v);
+/**
+ * atomic_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @v: pointer of type atomic_t
+ *
+ * Atomically subtracts @i from @v and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+#ifndef atomic_sub_and_test
+static inline bool atomic_sub_and_test(int i, atomic_t *v)
+{
+ return atomic_sub_return(i, v) == 0;
+}
+#endif
- do {
- val = atomic_cmpxchg(v, c, c + 1);
- if (val == c)
- return 1;
- c = val;
- } while (c);
-
- return 0;
+/**
+ * atomic_add_negative - add and test if negative
+ * @i: integer value to add
+ * @v: pointer of type atomic_t
+ *
+ * Atomically adds @i to @v and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+#ifndef atomic_add_negative
+static inline bool atomic_add_negative(int i, atomic_t *v)
+{
+ return atomic_add_return(i, v) < 0;
}
#endif
#ifndef atomic_inc_unless_negative
-static inline int atomic_inc_unless_negative(atomic_t *p)
+static inline bool atomic_inc_unless_negative(atomic_t *v)
{
- int v, v1;
- for (v = 0; v >= 0; v = v1) {
- v1 = atomic_cmpxchg(p, v, v + 1);
- if (likely(v1 == v))
- return 1;
- }
- return 0;
+ int c = atomic_read(v);
+
+ do {
+ if (unlikely(c < 0))
+ return false;
+ } while (!atomic_try_cmpxchg(v, &c, c + 1));
+
+ return true;
}
#endif
#ifndef atomic_dec_unless_positive
-static inline int atomic_dec_unless_positive(atomic_t *p)
+static inline bool atomic_dec_unless_positive(atomic_t *v)
{
- int v, v1;
- for (v = 0; v <= 0; v = v1) {
- v1 = atomic_cmpxchg(p, v, v - 1);
- if (likely(v1 == v))
- return 1;
- }
- return 0;
+ int c = atomic_read(v);
+
+ do {
+ if (unlikely(c > 0))
+ return false;
+ } while (!atomic_try_cmpxchg(v, &c, c - 1));
+
+ return true;
}
#endif
@@ -639,17 +708,14 @@
#ifndef atomic_dec_if_positive
static inline int atomic_dec_if_positive(atomic_t *v)
{
- int c, old, dec;
- c = atomic_read(v);
- for (;;) {
+ int dec, c = atomic_read(v);
+
+ do {
dec = c - 1;
if (unlikely(dec < 0))
break;
- old = atomic_cmpxchg((v), c, dec);
- if (likely(old == c))
- break;
- c = old;
- }
+ } while (!atomic_try_cmpxchg(v, &c, dec));
+
return dec;
}
#endif
@@ -693,11 +759,23 @@
#endif
#endif /* atomic64_add_return_relaxed */
+#ifndef atomic64_inc
+#define atomic64_inc(v) atomic64_add(1, (v))
+#endif
+
/* atomic64_inc_return_relaxed */
#ifndef atomic64_inc_return_relaxed
+
+#ifndef atomic64_inc_return
+#define atomic64_inc_return(v) atomic64_add_return(1, (v))
+#define atomic64_inc_return_relaxed(v) atomic64_add_return_relaxed(1, (v))
+#define atomic64_inc_return_acquire(v) atomic64_add_return_acquire(1, (v))
+#define atomic64_inc_return_release(v) atomic64_add_return_release(1, (v))
+#else /* atomic64_inc_return */
#define atomic64_inc_return_relaxed atomic64_inc_return
#define atomic64_inc_return_acquire atomic64_inc_return
#define atomic64_inc_return_release atomic64_inc_return
+#endif /* atomic64_inc_return */
#else /* atomic64_inc_return_relaxed */
@@ -742,11 +820,23 @@
#endif
#endif /* atomic64_sub_return_relaxed */
+#ifndef atomic64_dec
+#define atomic64_dec(v) atomic64_sub(1, (v))
+#endif
+
/* atomic64_dec_return_relaxed */
#ifndef atomic64_dec_return_relaxed
+
+#ifndef atomic64_dec_return
+#define atomic64_dec_return(v) atomic64_sub_return(1, (v))
+#define atomic64_dec_return_relaxed(v) atomic64_sub_return_relaxed(1, (v))
+#define atomic64_dec_return_acquire(v) atomic64_sub_return_acquire(1, (v))
+#define atomic64_dec_return_release(v) atomic64_sub_return_release(1, (v))
+#else /* atomic64_dec_return */
#define atomic64_dec_return_relaxed atomic64_dec_return
#define atomic64_dec_return_acquire atomic64_dec_return
#define atomic64_dec_return_release atomic64_dec_return
+#endif /* atomic64_dec_return */
#else /* atomic64_dec_return_relaxed */
@@ -927,12 +1017,22 @@
#endif
#endif /* atomic64_fetch_and_relaxed */
-#ifdef atomic64_andnot
-/* atomic64_fetch_andnot_relaxed */
+#ifndef atomic64_andnot
+#define atomic64_andnot(i, v) atomic64_and(~(long long)(i), (v))
+#endif
+
#ifndef atomic64_fetch_andnot_relaxed
-#define atomic64_fetch_andnot_relaxed atomic64_fetch_andnot
-#define atomic64_fetch_andnot_acquire atomic64_fetch_andnot
-#define atomic64_fetch_andnot_release atomic64_fetch_andnot
+
+#ifndef atomic64_fetch_andnot
+#define atomic64_fetch_andnot(i, v) atomic64_fetch_and(~(long long)(i), (v))
+#define atomic64_fetch_andnot_relaxed(i, v) atomic64_fetch_and_relaxed(~(long long)(i), (v))
+#define atomic64_fetch_andnot_acquire(i, v) atomic64_fetch_and_acquire(~(long long)(i), (v))
+#define atomic64_fetch_andnot_release(i, v) atomic64_fetch_and_release(~(long long)(i), (v))
+#else /* atomic64_fetch_andnot */
+#define atomic64_fetch_andnot_relaxed atomic64_fetch_andnot
+#define atomic64_fetch_andnot_acquire atomic64_fetch_andnot
+#define atomic64_fetch_andnot_release atomic64_fetch_andnot
+#endif /* atomic64_fetch_andnot */
#else /* atomic64_fetch_andnot_relaxed */
@@ -951,7 +1051,6 @@
__atomic_op_fence(atomic64_fetch_andnot, __VA_ARGS__)
#endif
#endif /* atomic64_fetch_andnot_relaxed */
-#endif /* atomic64_andnot */
/* atomic64_fetch_xor_relaxed */
#ifndef atomic64_fetch_xor_relaxed
@@ -1049,30 +1148,164 @@
#define atomic64_try_cmpxchg_release atomic64_try_cmpxchg
#endif /* atomic64_try_cmpxchg */
-#ifndef atomic64_andnot
-static inline void atomic64_andnot(long long i, atomic64_t *v)
+/**
+ * atomic64_fetch_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic64_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, if @v was not already @u.
+ * Returns the original value of @v.
+ */
+#ifndef atomic64_fetch_add_unless
+static inline long long atomic64_fetch_add_unless(atomic64_t *v, long long a,
+ long long u)
{
- atomic64_and(~i, v);
+ long long c = atomic64_read(v);
+
+ do {
+ if (unlikely(c == u))
+ break;
+ } while (!atomic64_try_cmpxchg(v, &c, c + a));
+
+ return c;
+}
+#endif
+
+/**
+ * atomic64_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, if @v was not already @u.
+ * Returns true if the addition was done.
+ */
+static inline bool atomic64_add_unless(atomic64_t *v, long long a, long long u)
+{
+ return atomic64_fetch_add_unless(v, a, u) != u;
}
-static inline long long atomic64_fetch_andnot(long long i, atomic64_t *v)
-{
- return atomic64_fetch_and(~i, v);
-}
+/**
+ * atomic64_inc_not_zero - increment unless the number is zero
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically increments @v by 1, if @v is non-zero.
+ * Returns true if the increment was done.
+ */
+#ifndef atomic64_inc_not_zero
+#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
+#endif
-static inline long long atomic64_fetch_andnot_relaxed(long long i, atomic64_t *v)
+/**
+ * atomic64_inc_and_test - increment and test
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically increments @v by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+#ifndef atomic64_inc_and_test
+static inline bool atomic64_inc_and_test(atomic64_t *v)
{
- return atomic64_fetch_and_relaxed(~i, v);
+ return atomic64_inc_return(v) == 0;
}
+#endif
-static inline long long atomic64_fetch_andnot_acquire(long long i, atomic64_t *v)
+/**
+ * atomic64_dec_and_test - decrement and test
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically decrements @v by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */
+#ifndef atomic64_dec_and_test
+static inline bool atomic64_dec_and_test(atomic64_t *v)
{
- return atomic64_fetch_and_acquire(~i, v);
+ return atomic64_dec_return(v) == 0;
}
+#endif
-static inline long long atomic64_fetch_andnot_release(long long i, atomic64_t *v)
+/**
+ * atomic64_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically subtracts @i from @v and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+#ifndef atomic64_sub_and_test
+static inline bool atomic64_sub_and_test(long long i, atomic64_t *v)
{
- return atomic64_fetch_and_release(~i, v);
+ return atomic64_sub_return(i, v) == 0;
+}
+#endif
+
+/**
+ * atomic64_add_negative - add and test if negative
+ * @i: integer value to add
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically adds @i to @v and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+#ifndef atomic64_add_negative
+static inline bool atomic64_add_negative(long long i, atomic64_t *v)
+{
+ return atomic64_add_return(i, v) < 0;
+}
+#endif
+
+#ifndef atomic64_inc_unless_negative
+static inline bool atomic64_inc_unless_negative(atomic64_t *v)
+{
+ long long c = atomic64_read(v);
+
+ do {
+ if (unlikely(c < 0))
+ return false;
+ } while (!atomic64_try_cmpxchg(v, &c, c + 1));
+
+ return true;
+}
+#endif
+
+#ifndef atomic64_dec_unless_positive
+static inline bool atomic64_dec_unless_positive(atomic64_t *v)
+{
+ long long c = atomic64_read(v);
+
+ do {
+ if (unlikely(c > 0))
+ return false;
+ } while (!atomic64_try_cmpxchg(v, &c, c - 1));
+
+ return true;
+}
+#endif
+
+/*
+ * atomic64_dec_if_positive - decrement by 1 if old value positive
+ * @v: pointer of type atomic64_t
+ *
+ * The function returns the old value of *v minus 1, even if
+ * the atomic64 variable, v, was not decremented.
+ */
+#ifndef atomic64_dec_if_positive
+static inline long long atomic64_dec_if_positive(atomic64_t *v)
+{
+ long long dec, c = atomic64_read(v);
+
+ do {
+ dec = c - 1;
+ if (unlikely(dec < 0))
+ break;
+ } while (!atomic64_try_cmpxchg(v, &c, dec));
+
+ return dec;
}
#endif
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 4cac4e1..af41901 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -2,29 +2,9 @@
#ifndef _LINUX_BITOPS_H
#define _LINUX_BITOPS_H
#include <asm/types.h>
+#include <linux/bits.h>
-#ifdef __KERNEL__
-#define BIT(nr) (1UL << (nr))
-#define BIT_ULL(nr) (1ULL << (nr))
-#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
-#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
-#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG))
-#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
-#define BITS_PER_BYTE 8
#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
-#endif
-
-/*
- * Create a contiguous bitmask starting at bit position @l and ending at
- * position @h. For example
- * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- */
-#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
-
-#define GENMASK_ULL(h, l) \
- (((~0ULL) - (1ULL << (l)) + 1) & \
- (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
extern unsigned int __sw_hweight8(unsigned int w);
extern unsigned int __sw_hweight16(unsigned int w);
diff --git a/include/linux/bits.h b/include/linux/bits.h
new file mode 100644
index 0000000..2b7b532
--- /dev/null
+++ b/include/linux/bits.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_BITS_H
+#define __LINUX_BITS_H
+#include <asm/bitsperlong.h>
+
+#define BIT(nr) (1UL << (nr))
+#define BIT_ULL(nr) (1ULL << (nr))
+#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
+#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
+#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG))
+#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
+#define BITS_PER_BYTE 8
+
+/*
+ * Create a contiguous bitmask starting at bit position @l and ending at
+ * position @h. For example
+ * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
+ */
+#define GENMASK(h, l) \
+ (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+
+#define GENMASK_ULL(h, l) \
+ (((~0ULL) - (1ULL << (l)) + 1) & \
+ (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
+
+#endif /* __LINUX_BITS_H */
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 7dff196..3089189 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -194,6 +194,9 @@
extern void clocksource_resume(void);
extern struct clocksource * __init clocksource_default_clock(void);
extern void clocksource_mark_unstable(struct clocksource *cs);
+extern void
+clocksource_start_suspend_timing(struct clocksource *cs, u64 start_cycles);
+extern u64 clocksource_stop_suspend_timing(struct clocksource *cs, u64 now);
extern u64
clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cycles);
diff --git a/include/linux/compat.h b/include/linux/compat.h
index c68acc4..df45ee8 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -115,11 +115,6 @@
struct compat_sel_arg_struct;
struct rusage;
-struct compat_itimerspec {
- struct compat_timespec it_interval;
- struct compat_timespec it_value;
-};
-
struct compat_utimbuf {
compat_time_t actime;
compat_time_t modtime;
@@ -300,10 +295,6 @@
extern int compat_put_timespec(const struct timespec *, void __user *);
extern int compat_get_timeval(struct timeval *, const void __user *);
extern int compat_put_timeval(const struct timeval *, void __user *);
-extern int get_compat_itimerspec64(struct itimerspec64 *its,
- const struct compat_itimerspec __user *uits);
-extern int put_compat_itimerspec64(const struct itimerspec64 *its,
- struct compat_itimerspec __user *uits);
struct compat_iovec {
compat_uptr_t iov_base;
diff --git a/include/linux/compat_time.h b/include/linux/compat_time.h
index 31f2774..e70bfd1 100644
--- a/include/linux/compat_time.h
+++ b/include/linux/compat_time.h
@@ -17,7 +17,16 @@
s32 tv_usec;
};
+struct compat_itimerspec {
+ struct compat_timespec it_interval;
+ struct compat_timespec it_value;
+};
+
extern int compat_get_timespec64(struct timespec64 *, const void __user *);
extern int compat_put_timespec64(const struct timespec64 *, void __user *);
+extern int get_compat_itimerspec64(struct itimerspec64 *its,
+ const struct compat_itimerspec __user *uits);
+extern int put_compat_itimerspec64(const struct itimerspec64 *its,
+ struct compat_itimerspec __user *uits);
#endif /* _LINUX_COMPAT_TIME_H */
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 8796ba3..4cf06a6 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -164,6 +164,7 @@
CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
+ CPUHP_AP_WATCHDOG_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
CPUHP_AP_ONLINE_DYN,
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 56add82..401e4b2 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -894,6 +894,16 @@
void *flush;
} efi_file_handle_t;
+typedef struct {
+ u64 revision;
+ u32 open_volume;
+} efi_file_io_interface_32_t;
+
+typedef struct {
+ u64 revision;
+ u64 open_volume;
+} efi_file_io_interface_64_t;
+
typedef struct _efi_file_io_interface {
u64 revision;
int (*open_volume)(struct _efi_file_io_interface *,
@@ -988,14 +998,12 @@
extern void efi_gettimeofday (struct timespec64 *ts);
extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */
#ifdef CONFIG_X86
-extern void efi_late_init(void);
extern void efi_free_boot_services(void);
extern efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
bool nonblocking);
extern void efi_find_mirror(void);
#else
-static inline void efi_late_init(void) {}
static inline void efi_free_boot_services(void) {}
static inline efi_status_t efi_query_variable_store(u32 attributes,
@@ -1651,4 +1659,7 @@
extern int efi_tpm_eventlog_init(void);
+/* Workqueue to queue EFI Runtime Services */
+extern struct workqueue_struct *efi_rts_wq;
+
#endif /* _LINUX_EFI_H */
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index cbb872c..9d2ea3e 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -73,6 +73,7 @@
#define GICD_TYPER_MBIS (1U << 16)
#define GICD_TYPER_ID_BITS(typer) ((((typer) >> 19) & 0x1f) + 1)
+#define GICD_TYPER_NUM_LPIS(typer) ((((typer) >> 11) & 0x1f) + 1)
#define GICD_TYPER_IRQS(typer) ((((typer) & 0x1f) + 1) * 32)
#define GICD_IROUTER_SPI_MODE_ONE (0U << 31)
@@ -576,8 +577,8 @@
phys_addr_t phys_base;
} __percpu *rdist;
struct page *prop_page;
- int id_bits;
u64 flags;
+ u32 gicd_typer;
bool has_vlpis;
bool has_direct_lpi;
};
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9440a2f..e909413 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -63,7 +63,6 @@
struct kretprobe;
struct kretprobe_instance;
typedef int (*kprobe_pre_handler_t) (struct kprobe *, struct pt_regs *);
-typedef int (*kprobe_break_handler_t) (struct kprobe *, struct pt_regs *);
typedef void (*kprobe_post_handler_t) (struct kprobe *, struct pt_regs *,
unsigned long flags);
typedef int (*kprobe_fault_handler_t) (struct kprobe *, struct pt_regs *,
@@ -101,12 +100,6 @@
*/
kprobe_fault_handler_t fault_handler;
- /*
- * ... called if breakpoint trap occurs in probe handler.
- * Return 1 if it handled break, otherwise kernel will see it.
- */
- kprobe_break_handler_t break_handler;
-
/* Saved opcode (which has been replaced with breakpoint) */
kprobe_opcode_t opcode;
@@ -155,24 +148,6 @@
}
/*
- * Special probe type that uses setjmp-longjmp type tricks to resume
- * execution at a specified entry with a matching prototype corresponding
- * to the probed function - a trick to enable arguments to become
- * accessible seamlessly by probe handling logic.
- * Note:
- * Because of the way compilers allocate stack space for local variables
- * etc upfront, regardless of sub-scopes within a function, this mirroring
- * principle currently works only for probes placed on function entry points.
- */
-struct jprobe {
- struct kprobe kp;
- void *entry; /* probe handling code to jump to */
-};
-
-/* For backward compatibility with old code using JPROBE_ENTRY() */
-#define JPROBE_ENTRY(handler) (handler)
-
-/*
* Function-return probe -
* Note:
* User needs to provide a handler function, and initialize maxactive.
@@ -389,9 +364,6 @@
void unregister_kprobe(struct kprobe *p);
int register_kprobes(struct kprobe **kps, int num);
void unregister_kprobes(struct kprobe **kps, int num);
-int setjmp_pre_handler(struct kprobe *, struct pt_regs *);
-int longjmp_break_handler(struct kprobe *, struct pt_regs *);
-void jprobe_return(void);
unsigned long arch_deref_entry_point(void *);
int register_kretprobe(struct kretprobe *rp);
@@ -439,9 +411,6 @@
static inline void unregister_kprobes(struct kprobe **kps, int num)
{
}
-static inline void jprobe_return(void)
-{
-}
static inline int register_kretprobe(struct kretprobe *rp)
{
return -ENOSYS;
@@ -468,20 +437,6 @@
return -ENOSYS;
}
#endif /* CONFIG_KPROBES */
-static inline int register_jprobe(struct jprobe *p)
-{
- return -ENOSYS;
-}
-static inline int register_jprobes(struct jprobe **jps, int num)
-{
- return -ENOSYS;
-}
-static inline void unregister_jprobe(struct jprobe *p)
-{
-}
-static inline void unregister_jprobes(struct jprobe **jps, int num)
-{
-}
static inline int disable_kretprobe(struct kretprobe *rp)
{
return disable_kprobe(&rp->kp);
@@ -490,14 +445,6 @@
{
return enable_kprobe(&rp->kp);
}
-static inline int disable_jprobe(struct jprobe *jp)
-{
- return -ENOSYS;
-}
-static inline int enable_jprobe(struct jprobe *jp)
-{
- return -ENOSYS;
-}
#ifndef CONFIG_KPROBES
static inline bool is_kprobe_insn_slot(unsigned long addr)
diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 5b9fddb..b2bb44f 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -93,8 +93,11 @@
/* Map the ktime_t to timeval conversion to ns_to_timeval function */
#define ktime_to_timeval(kt) ns_to_timeval((kt))
-/* Convert ktime_t to nanoseconds - NOP in the scalar storage format: */
-#define ktime_to_ns(kt) (kt)
+/* Convert ktime_t to nanoseconds */
+static inline s64 ktime_to_ns(const ktime_t kt)
+{
+ return kt;
+}
/**
* ktime_compare - Compares two ktime_t variables for less, greater or equal
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 99ce070..efdc24d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -335,176 +335,183 @@
struct kioctx_table;
struct mm_struct {
- struct vm_area_struct *mmap; /* list of VMAs */
- struct rb_root mm_rb;
- u32 vmacache_seqnum; /* per-thread vmacache */
+ struct {
+ struct vm_area_struct *mmap; /* list of VMAs */
+ struct rb_root mm_rb;
+ u32 vmacache_seqnum; /* per-thread vmacache */
#ifdef CONFIG_MMU
- unsigned long (*get_unmapped_area) (struct file *filp,
+ unsigned long (*get_unmapped_area) (struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
#endif
- unsigned long mmap_base; /* base of mmap area */
- unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
+ unsigned long mmap_base; /* base of mmap area */
+ unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
- /* Base adresses for compatible mmap() */
- unsigned long mmap_compat_base;
- unsigned long mmap_compat_legacy_base;
+ /* Base adresses for compatible mmap() */
+ unsigned long mmap_compat_base;
+ unsigned long mmap_compat_legacy_base;
#endif
- unsigned long task_size; /* size of task vm space */
- unsigned long highest_vm_end; /* highest vma end address */
- pgd_t * pgd;
+ unsigned long task_size; /* size of task vm space */
+ unsigned long highest_vm_end; /* highest vma end address */
+ pgd_t * pgd;
- /**
- * @mm_users: The number of users including userspace.
- *
- * Use mmget()/mmget_not_zero()/mmput() to modify. When this drops
- * to 0 (i.e. when the task exits and there are no other temporary
- * reference holders), we also release a reference on @mm_count
- * (which may then free the &struct mm_struct if @mm_count also
- * drops to 0).
- */
- atomic_t mm_users;
+ /**
+ * @mm_users: The number of users including userspace.
+ *
+ * Use mmget()/mmget_not_zero()/mmput() to modify. When this
+ * drops to 0 (i.e. when the task exits and there are no other
+ * temporary reference holders), we also release a reference on
+ * @mm_count (which may then free the &struct mm_struct if
+ * @mm_count also drops to 0).
+ */
+ atomic_t mm_users;
- /**
- * @mm_count: The number of references to &struct mm_struct
- * (@mm_users count as 1).
- *
- * Use mmgrab()/mmdrop() to modify. When this drops to 0, the
- * &struct mm_struct is freed.
- */
- atomic_t mm_count;
+ /**
+ * @mm_count: The number of references to &struct mm_struct
+ * (@mm_users count as 1).
+ *
+ * Use mmgrab()/mmdrop() to modify. When this drops to 0, the
+ * &struct mm_struct is freed.
+ */
+ atomic_t mm_count;
#ifdef CONFIG_MMU
- atomic_long_t pgtables_bytes; /* PTE page table pages */
+ atomic_long_t pgtables_bytes; /* PTE page table pages */
#endif
- int map_count; /* number of VMAs */
+ int map_count; /* number of VMAs */
- spinlock_t page_table_lock; /* Protects page tables and some counters */
- struct rw_semaphore mmap_sem;
+ spinlock_t page_table_lock; /* Protects page tables and some
+ * counters
+ */
+ struct rw_semaphore mmap_sem;
- struct list_head mmlist; /* List of maybe swapped mm's. These are globally strung
- * together off init_mm.mmlist, and are protected
- * by mmlist_lock
- */
+ struct list_head mmlist; /* List of maybe swapped mm's. These
+ * are globally strung together off
+ * init_mm.mmlist, and are protected
+ * by mmlist_lock
+ */
- unsigned long hiwater_rss; /* High-watermark of RSS usage */
- unsigned long hiwater_vm; /* High-water virtual memory usage */
+ unsigned long hiwater_rss; /* High-watermark of RSS usage */
+ unsigned long hiwater_vm; /* High-water virtual memory usage */
- unsigned long total_vm; /* Total pages mapped */
- unsigned long locked_vm; /* Pages that have PG_mlocked set */
- unsigned long pinned_vm; /* Refcount permanently increased */
- unsigned long data_vm; /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
- unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
- unsigned long stack_vm; /* VM_STACK */
- unsigned long def_flags;
+ unsigned long total_vm; /* Total pages mapped */
+ unsigned long locked_vm; /* Pages that have PG_mlocked set */
+ unsigned long pinned_vm; /* Refcount permanently increased */
+ unsigned long data_vm; /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
+ unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
+ unsigned long stack_vm; /* VM_STACK */
+ unsigned long def_flags;
- spinlock_t arg_lock; /* protect the below fields */
- unsigned long start_code, end_code, start_data, end_data;
- unsigned long start_brk, brk, start_stack;
- unsigned long arg_start, arg_end, env_start, env_end;
+ spinlock_t arg_lock; /* protect the below fields */
+ unsigned long start_code, end_code, start_data, end_data;
+ unsigned long start_brk, brk, start_stack;
+ unsigned long arg_start, arg_end, env_start, env_end;
- unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
+ unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
- /*
- * Special counters, in some configurations protected by the
- * page_table_lock, in other configurations by being atomic.
- */
- struct mm_rss_stat rss_stat;
+ /*
+ * Special counters, in some configurations protected by the
+ * page_table_lock, in other configurations by being atomic.
+ */
+ struct mm_rss_stat rss_stat;
- struct linux_binfmt *binfmt;
+ struct linux_binfmt *binfmt;
- cpumask_var_t cpu_vm_mask_var;
+ /* Architecture-specific MM context */
+ mm_context_t context;
- /* Architecture-specific MM context */
- mm_context_t context;
+ unsigned long flags; /* Must use atomic bitops to access */
- unsigned long flags; /* Must use atomic bitops to access the bits */
-
- struct core_state *core_state; /* coredumping support */
+ struct core_state *core_state; /* coredumping support */
#ifdef CONFIG_MEMBARRIER
- atomic_t membarrier_state;
+ atomic_t membarrier_state;
#endif
#ifdef CONFIG_AIO
- spinlock_t ioctx_lock;
- struct kioctx_table __rcu *ioctx_table;
+ spinlock_t ioctx_lock;
+ struct kioctx_table __rcu *ioctx_table;
#endif
#ifdef CONFIG_MEMCG
- /*
- * "owner" points to a task that is regarded as the canonical
- * user/owner of this mm. All of the following must be true in
- * order for it to be changed:
- *
- * current == mm->owner
- * current->mm != mm
- * new_owner->mm == mm
- * new_owner->alloc_lock is held
- */
- struct task_struct __rcu *owner;
+ /*
+ * "owner" points to a task that is regarded as the canonical
+ * user/owner of this mm. All of the following must be true in
+ * order for it to be changed:
+ *
+ * current == mm->owner
+ * current->mm != mm
+ * new_owner->mm == mm
+ * new_owner->alloc_lock is held
+ */
+ struct task_struct __rcu *owner;
#endif
- struct user_namespace *user_ns;
+ struct user_namespace *user_ns;
- /* store ref to file /proc/<pid>/exe symlink points to */
- struct file __rcu *exe_file;
+ /* store ref to file /proc/<pid>/exe symlink points to */
+ struct file __rcu *exe_file;
#ifdef CONFIG_MMU_NOTIFIER
- struct mmu_notifier_mm *mmu_notifier_mm;
+ struct mmu_notifier_mm *mmu_notifier_mm;
#endif
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
- pgtable_t pmd_huge_pte; /* protected by page_table_lock */
-#endif
-#ifdef CONFIG_CPUMASK_OFFSTACK
- struct cpumask cpumask_allocation;
+ pgtable_t pmd_huge_pte; /* protected by page_table_lock */
#endif
#ifdef CONFIG_NUMA_BALANCING
- /*
- * numa_next_scan is the next time that the PTEs will be marked
- * pte_numa. NUMA hinting faults will gather statistics and migrate
- * pages to new nodes if necessary.
- */
- unsigned long numa_next_scan;
+ /*
+ * numa_next_scan is the next time that the PTEs will be marked
+ * pte_numa. NUMA hinting faults will gather statistics and
+ * migrate pages to new nodes if necessary.
+ */
+ unsigned long numa_next_scan;
- /* Restart point for scanning and setting pte_numa */
- unsigned long numa_scan_offset;
+ /* Restart point for scanning and setting pte_numa */
+ unsigned long numa_scan_offset;
- /* numa_scan_seq prevents two threads setting pte_numa */
- int numa_scan_seq;
+ /* numa_scan_seq prevents two threads setting pte_numa */
+ int numa_scan_seq;
#endif
- /*
- * An operation with batched TLB flushing is going on. Anything that
- * can move process memory needs to flush the TLB when moving a
- * PROT_NONE or PROT_NUMA mapped page.
- */
- atomic_t tlb_flush_pending;
+ /*
+ * An operation with batched TLB flushing is going on. Anything
+ * that can move process memory needs to flush the TLB when
+ * moving a PROT_NONE or PROT_NUMA mapped page.
+ */
+ atomic_t tlb_flush_pending;
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
- /* See flush_tlb_batched_pending() */
- bool tlb_flush_batched;
+ /* See flush_tlb_batched_pending() */
+ bool tlb_flush_batched;
#endif
- struct uprobes_state uprobes_state;
+ struct uprobes_state uprobes_state;
#ifdef CONFIG_HUGETLB_PAGE
- atomic_long_t hugetlb_usage;
+ atomic_long_t hugetlb_usage;
#endif
- struct work_struct async_put_work;
+ struct work_struct async_put_work;
#if IS_ENABLED(CONFIG_HMM)
- /* HMM needs to track a few things per mm */
- struct hmm *hmm;
+ /* HMM needs to track a few things per mm */
+ struct hmm *hmm;
#endif
-} __randomize_layout;
+ } __randomize_layout;
+
+ /*
+ * The mm_cpumask needs to be at the end of mm_struct, because it
+ * is dynamically sized based on nr_cpu_ids.
+ */
+ unsigned long cpu_bitmap[];
+};
extern struct mm_struct init_mm;
+/* Pointer magic because the dynamic array size confuses some compilers. */
static inline void mm_init_cpumask(struct mm_struct *mm)
{
-#ifdef CONFIG_CPUMASK_OFFSTACK
- mm->cpu_vm_mask_var = &mm->cpumask_allocation;
-#endif
- cpumask_clear(mm->cpu_vm_mask_var);
+ unsigned long cpu_bitmap = (unsigned long)mm;
+
+ cpu_bitmap += offsetof(struct mm_struct, cpu_bitmap);
+ cpumask_clear((struct cpumask *)cpu_bitmap);
}
/* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
{
- return mm->cpu_vm_mask_var;
+ return (struct cpumask *)&mm->cpu_bitmap;
}
struct mmu_gather;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b8d868d..08f9247 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -45,12 +45,18 @@
extern void touch_softlockup_watchdog_sync(void);
extern void touch_all_softlockup_watchdogs(void);
extern unsigned int softlockup_panic;
-#else
+
+extern int lockup_detector_online_cpu(unsigned int cpu);
+extern int lockup_detector_offline_cpu(unsigned int cpu);
+#else /* CONFIG_SOFTLOCKUP_DETECTOR */
static inline void touch_softlockup_watchdog_sched(void) { }
static inline void touch_softlockup_watchdog(void) { }
static inline void touch_softlockup_watchdog_sync(void) { }
static inline void touch_all_softlockup_watchdogs(void) { }
-#endif
+
+#define lockup_detector_online_cpu NULL
+#define lockup_detector_offline_cpu NULL
+#endif /* CONFIG_SOFTLOCKUP_DETECTOR */
#ifdef CONFIG_DETECT_HUNG_TASK
void reset_hung_task_detector(void);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 87f6db4..53c500f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -490,7 +490,7 @@
};
/**
- * enum perf_event_state - the states of a event
+ * enum perf_event_state - the states of an event:
*/
enum perf_event_state {
PERF_EVENT_STATE_DEAD = -4,
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index c85704f..ee7e987e 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -95,8 +95,8 @@
clockid_t it_clock;
timer_t it_id;
int it_active;
- int it_overrun;
- int it_overrun_last;
+ s64 it_overrun;
+ s64 it_overrun_last;
int it_requeue_pending;
int it_sigev_notify;
ktime_t it_interval;
diff --git a/include/linux/pti.h b/include/linux/pti.h
index 0174883..1a941ef 100644
--- a/include/linux/pti.h
+++ b/include/linux/pti.h
@@ -6,6 +6,7 @@
#include <asm/pti.h>
#else
static inline void pti_init(void) { }
+static inline void pti_finalize(void) { }
#endif
#endif
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 36df6cc..4786c22 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -396,7 +396,16 @@
* @member: the name of the list_head within the struct.
*
* Continue to iterate over list of given type, continuing after
- * the current position.
+ * the current position which must have been in the list when the RCU read
+ * lock was taken.
+ * This would typically require either that you obtained the node from a
+ * previous walk of the list in the same RCU read-side critical section, or
+ * that you held some sort of non-RCU reference (such as a reference count)
+ * to keep the node alive *and* in the list.
+ *
+ * This iterator is similar to list_for_each_entry_from_rcu() except
+ * this starts after the given position and that one starts at the given
+ * position.
*/
#define list_for_each_entry_continue_rcu(pos, head, member) \
for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \
@@ -411,6 +420,14 @@
*
* Iterate over the tail of a list starting from a given position,
* which must have been in the list when the RCU read lock was taken.
+ * This would typically require either that you obtained the node from a
+ * previous walk of the list in the same RCU read-side critical section, or
+ * that you held some sort of non-RCU reference (such as a reference count)
+ * to keep the node alive *and* in the list.
+ *
+ * This iterator is similar to list_for_each_entry_continue_rcu() except
+ * this starts from the given position and that one starts from the position
+ * after the given position.
*/
#define list_for_each_entry_from_rcu(pos, head, member) \
for (; &(pos)->member != (head); \
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 65163aa..75e5b39 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -64,7 +64,6 @@
void __rcu_read_lock(void);
void __rcu_read_unlock(void);
-void rcu_read_unlock_special(struct task_struct *t);
void synchronize_rcu(void);
/*
@@ -159,11 +158,11 @@
} while (0)
/*
- * Note a voluntary context switch for RCU-tasks benefit. This is a
- * macro rather than an inline function to avoid #include hell.
+ * Note a quasi-voluntary context switch for RCU-tasks's benefit.
+ * This is a macro rather than an inline function to avoid #include hell.
*/
#ifdef CONFIG_TASKS_RCU
-#define rcu_note_voluntary_context_switch_lite(t) \
+#define rcu_tasks_qs(t) \
do { \
if (READ_ONCE((t)->rcu_tasks_holdout)) \
WRITE_ONCE((t)->rcu_tasks_holdout, false); \
@@ -171,14 +170,14 @@
#define rcu_note_voluntary_context_switch(t) \
do { \
rcu_all_qs(); \
- rcu_note_voluntary_context_switch_lite(t); \
+ rcu_tasks_qs(t); \
} while (0)
void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
void synchronize_rcu_tasks(void);
void exit_tasks_rcu_start(void);
void exit_tasks_rcu_finish(void);
#else /* #ifdef CONFIG_TASKS_RCU */
-#define rcu_note_voluntary_context_switch_lite(t) do { } while (0)
+#define rcu_tasks_qs(t) do { } while (0)
#define rcu_note_voluntary_context_switch(t) rcu_all_qs()
#define call_rcu_tasks call_rcu_sched
#define synchronize_rcu_tasks synchronize_sched
@@ -195,8 +194,8 @@
*/
#define cond_resched_tasks_rcu_qs() \
do { \
- if (!cond_resched()) \
- rcu_note_voluntary_context_switch_lite(current); \
+ rcu_tasks_qs(current); \
+ cond_resched(); \
} while (0)
/*
@@ -567,8 +566,8 @@
* This is simply an identity function, but it documents where a pointer
* is handed off from RCU to some other synchronization mechanism, for
* example, reference counting or locking. In C11, it would map to
- * kill_dependency(). It could be used as follows:
- * ``
+ * kill_dependency(). It could be used as follows::
+ *
* rcu_read_lock();
* p = rcu_dereference(gp);
* long_lived = is_long_lived(p);
@@ -579,7 +578,6 @@
* p = rcu_pointer_handoff(p);
* }
* rcu_read_unlock();
- *``
*/
#define rcu_pointer_handoff(p) (p)
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 7b3c82e..8d9a0ea 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -93,7 +93,7 @@
#define rcu_note_context_switch(preempt) \
do { \
rcu_sched_qs(); \
- rcu_note_voluntary_context_switch_lite(current); \
+ rcu_tasks_qs(current); \
} while (0)
static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index a685da2..e28cce2 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -3,9 +3,10 @@
#define _LINUX_REFCOUNT_H
#include <linux/atomic.h>
-#include <linux/mutex.h>
-#include <linux/spinlock.h>
-#include <linux/kernel.h>
+#include <linux/compiler.h>
+#include <linux/spinlock_types.h>
+
+struct mutex;
/**
* struct refcount_t - variant of atomic_t specialized for reference counts
@@ -42,17 +43,30 @@
return atomic_read(&r->refs);
}
+extern __must_check bool refcount_add_not_zero_checked(unsigned int i, refcount_t *r);
+extern void refcount_add_checked(unsigned int i, refcount_t *r);
+
+extern __must_check bool refcount_inc_not_zero_checked(refcount_t *r);
+extern void refcount_inc_checked(refcount_t *r);
+
+extern __must_check bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r);
+
+extern __must_check bool refcount_dec_and_test_checked(refcount_t *r);
+extern void refcount_dec_checked(refcount_t *r);
+
#ifdef CONFIG_REFCOUNT_FULL
-extern __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r);
-extern void refcount_add(unsigned int i, refcount_t *r);
-extern __must_check bool refcount_inc_not_zero(refcount_t *r);
-extern void refcount_inc(refcount_t *r);
+#define refcount_add_not_zero refcount_add_not_zero_checked
+#define refcount_add refcount_add_checked
-extern __must_check bool refcount_sub_and_test(unsigned int i, refcount_t *r);
+#define refcount_inc_not_zero refcount_inc_not_zero_checked
+#define refcount_inc refcount_inc_checked
-extern __must_check bool refcount_dec_and_test(refcount_t *r);
-extern void refcount_dec(refcount_t *r);
+#define refcount_sub_and_test refcount_sub_and_test_checked
+
+#define refcount_dec_and_test refcount_dec_and_test_checked
+#define refcount_dec refcount_dec_checked
+
#else
# ifdef CONFIG_ARCH_HAS_REFCOUNT
# include <asm/refcount.h>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 43731fe..dac5086 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -167,8 +167,8 @@
* need_sleep = false;
* wake_up_state(p, TASK_UNINTERRUPTIBLE);
*
- * Where wake_up_state() (and all other wakeup primitives) imply enough
- * barriers to order the store of the variable against wakeup.
+ * where wake_up_state() executes a full memory barrier before accessing the
+ * task state.
*
* Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
* once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
@@ -1017,7 +1017,6 @@
u64 last_sum_exec_runtime;
struct callback_head numa_work;
- struct list_head numa_entry;
struct numa_group *numa_group;
/*
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 1c1a151..913488d 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -40,7 +40,6 @@
#ifdef CONFIG_SCHED_DEBUG
extern __read_mostly unsigned int sysctl_sched_migration_cost;
extern __read_mostly unsigned int sysctl_sched_nr_migrate;
-extern __read_mostly unsigned int sysctl_sched_time_avg;
int sched_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length,
diff --git a/include/linux/sched_clock.h b/include/linux/sched_clock.h
index 411b52e..abe28d5 100644
--- a/include/linux/sched_clock.h
+++ b/include/linux/sched_clock.h
@@ -9,17 +9,16 @@
#define LINUX_SCHED_CLOCK
#ifdef CONFIG_GENERIC_SCHED_CLOCK
-extern void sched_clock_postinit(void);
+extern void generic_sched_clock_init(void);
extern void sched_clock_register(u64 (*read)(void), int bits,
unsigned long rate);
#else
-static inline void sched_clock_postinit(void) { }
+static inline void generic_sched_clock_init(void) { }
static inline void sched_clock_register(u64 (*read)(void), int bits,
unsigned long rate)
{
- ;
}
#endif
diff --git a/include/linux/smpboot.h b/include/linux/smpboot.h
index c174844..d0884b5 100644
--- a/include/linux/smpboot.h
+++ b/include/linux/smpboot.h
@@ -25,8 +25,6 @@
* parked (cpu offline)
* @unpark: Optional unpark function, called when the thread is
* unparked (cpu online)
- * @cpumask: Internal state. To update which threads are unparked,
- * call smpboot_update_cpumask_percpu_thread().
* @selfparking: Thread is not parked by the park function.
* @thread_comm: The base name of the thread
*/
@@ -40,23 +38,12 @@
void (*cleanup)(unsigned int cpu, bool online);
void (*park)(unsigned int cpu);
void (*unpark)(unsigned int cpu);
- cpumask_var_t cpumask;
bool selfparking;
const char *thread_comm;
};
-int smpboot_register_percpu_thread_cpumask(struct smp_hotplug_thread *plug_thread,
- const struct cpumask *cpumask);
-
-static inline int
-smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
-{
- return smpboot_register_percpu_thread_cpumask(plug_thread,
- cpu_possible_mask);
-}
+int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread);
void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread);
-void smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
- const struct cpumask *);
#endif
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index fd57888..3190997 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -114,29 +114,48 @@
#endif /*arch_spin_is_contended*/
/*
- * This barrier must provide two things:
+ * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
+ * between program-order earlier lock acquisitions and program-order later
+ * memory accesses.
*
- * - it must guarantee a STORE before the spin_lock() is ordered against a
- * LOAD after it, see the comments at its two usage sites.
+ * This guarantees that the following two properties hold:
*
- * - it must ensure the critical section is RCsc.
+ * 1) Given the snippet:
*
- * The latter is important for cases where we observe values written by other
- * CPUs in spin-loops, without barriers, while being subject to scheduling.
+ * { X = 0; Y = 0; }
*
- * CPU0 CPU1 CPU2
+ * CPU0 CPU1
*
- * for (;;) {
- * if (READ_ONCE(X))
- * break;
- * }
- * X=1
- * <sched-out>
- * <sched-in>
- * r = X;
+ * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
+ * spin_lock(S); smp_mb();
+ * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
+ * r0 = READ_ONCE(Y);
+ * spin_unlock(S);
*
- * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
- * we get migrated and CPU2 sees X==0.
+ * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
+ * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
+ * preceding the call to smp_mb__after_spinlock() in __schedule() and in
+ * try_to_wake_up().
+ *
+ * 2) Given the snippet:
+ *
+ * { X = 0; Y = 0; }
+ *
+ * CPU0 CPU1 CPU2
+ *
+ * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
+ * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
+ * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
+ * WRITE_ONCE(Y, 1);
+ * spin_unlock(S);
+ *
+ * it is forbidden that CPU0's critical section executes before CPU1's
+ * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
+ * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
+ * preceding the calls to smp_rmb() in try_to_wake_up() for similar
+ * snippets but "projected" onto two CPUs.
+ *
+ * Property (2) upgrades the lock to an RCsc lock.
*
* Since most load-store architectures implement ACQUIRE with an smp_mb() after
* the LL/SC loop, they need no further barriers. Similarly all our TSO
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7..3e72a29 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@
return retval;
}
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+ int retval;
+
+ retval = __srcu_read_lock(sp);
+ return retval;
+}
+
/**
* srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
* @sp: srcu_struct in which to unregister the old reader.
@@ -209,6 +219,13 @@
__srcu_read_unlock(sp, idx);
}
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+ __srcu_read_unlock(sp, idx);
+}
+
/**
* smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
*
diff --git a/include/linux/swait.h b/include/linux/swait.h
index bf8cb0d..73e06e9 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -16,7 +16,7 @@
* wait-queues, but the semantics are actually completely different, and
* every single user we have ever had has been buggy (or pointless).
*
- * A "swake_up()" only wakes up _one_ waiter, which is not at all what
+ * A "swake_up_one()" only wakes up _one_ waiter, which is not at all what
* "wake_up()" does, and has led to problems. In other cases, it has
* been fine, because there's only ever one waiter (kvm), but in that
* case gthe whole "simple" wait-queue is just pointless to begin with,
@@ -38,8 +38,8 @@
* all wakeups are TASK_NORMAL in order to avoid O(n) lookups for the right
* sleeper state.
*
- * - the exclusive mode; because this requires preserving the list order
- * and this is hard.
+ * - the !exclusive mode; because that leads to O(n) wakeups, everything is
+ * exclusive.
*
* - custom wake callback functions; because you cannot give any guarantees
* about random code. This also allows swait to be used in RT, such that
@@ -115,7 +115,7 @@
* CPU0 - waker CPU1 - waiter
*
* for (;;) {
- * @cond = true; prepare_to_swait(&wq_head, &wait, state);
+ * @cond = true; prepare_to_swait_exclusive(&wq_head, &wait, state);
* smp_mb(); // smp_mb() from set_current_state()
* if (swait_active(wq_head)) if (@cond)
* wake_up(wq_head); break;
@@ -157,20 +157,20 @@
return swait_active(wq);
}
-extern void swake_up(struct swait_queue_head *q);
+extern void swake_up_one(struct swait_queue_head *q);
extern void swake_up_all(struct swait_queue_head *q);
extern void swake_up_locked(struct swait_queue_head *q);
-extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
-extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state);
+extern void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state);
extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
extern void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
-/* as per ___wait_event() but for swait, therefore "exclusive == 0" */
+/* as per ___wait_event() but for swait, therefore "exclusive == 1" */
#define ___swait_event(wq, condition, state, ret, cmd) \
({ \
+ __label__ __out; \
struct swait_queue __wait; \
long __ret = ret; \
\
@@ -183,20 +183,20 @@
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
- break; \
+ goto __out; \
} \
\
cmd; \
} \
finish_swait(&wq, &__wait); \
- __ret; \
+__out: __ret; \
})
#define __swait_event(wq, condition) \
(void)___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, \
schedule())
-#define swait_event(wq, condition) \
+#define swait_event_exclusive(wq, condition) \
do { \
if (condition) \
break; \
@@ -208,7 +208,7 @@
TASK_UNINTERRUPTIBLE, timeout, \
__ret = schedule_timeout(__ret))
-#define swait_event_timeout(wq, condition, timeout) \
+#define swait_event_timeout_exclusive(wq, condition, timeout) \
({ \
long __ret = timeout; \
if (!___wait_cond_timeout(condition)) \
@@ -220,7 +220,7 @@
___swait_event(wq, condition, TASK_INTERRUPTIBLE, 0, \
schedule())
-#define swait_event_interruptible(wq, condition) \
+#define swait_event_interruptible_exclusive(wq, condition) \
({ \
int __ret = 0; \
if (!(condition)) \
@@ -233,7 +233,7 @@
TASK_INTERRUPTIBLE, timeout, \
__ret = schedule_timeout(__ret))
-#define swait_event_interruptible_timeout(wq, condition, timeout) \
+#define swait_event_interruptible_timeout_exclusive(wq, condition, timeout)\
({ \
long __ret = timeout; \
if (!___wait_cond_timeout(condition)) \
@@ -246,7 +246,7 @@
(void)___swait_event(wq, condition, TASK_IDLE, 0, schedule())
/**
- * swait_event_idle - wait without system load contribution
+ * swait_event_idle_exclusive - wait without system load contribution
* @wq: the waitqueue to wait on
* @condition: a C expression for the event to wait for
*
@@ -257,7 +257,7 @@
* condition and doesn't want to contribute to system load. Signals are
* ignored.
*/
-#define swait_event_idle(wq, condition) \
+#define swait_event_idle_exclusive(wq, condition) \
do { \
if (condition) \
break; \
@@ -270,7 +270,7 @@
__ret = schedule_timeout(__ret))
/**
- * swait_event_idle_timeout - wait up to timeout without load contribution
+ * swait_event_idle_timeout_exclusive - wait up to timeout without load contribution
* @wq: the waitqueue to wait on
* @condition: a C expression for the event to wait for
* @timeout: timeout at which we'll give up in jiffies
@@ -288,7 +288,7 @@
* or the remaining jiffies (at least 1) if the @condition evaluated
* to %true before the @timeout elapsed.
*/
-#define swait_event_idle_timeout(wq, condition, timeout) \
+#define swait_event_idle_timeout_exclusive(wq, condition, timeout) \
({ \
long __ret = timeout; \
if (!___wait_cond_timeout(condition)) \
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 5c1a093..ebb2f24 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -506,9 +506,9 @@
/* fs/timerfd.c */
asmlinkage long sys_timerfd_create(int clockid, int flags);
asmlinkage long sys_timerfd_settime(int ufd, int flags,
- const struct itimerspec __user *utmr,
- struct itimerspec __user *otmr);
-asmlinkage long sys_timerfd_gettime(int ufd, struct itimerspec __user *otmr);
+ const struct __kernel_itimerspec __user *utmr,
+ struct __kernel_itimerspec __user *otmr);
+asmlinkage long sys_timerfd_gettime(int ufd, struct __kernel_itimerspec __user *otmr);
/* fs/utimes.c */
asmlinkage long sys_utimensat(int dfd, const char __user *filename,
@@ -573,10 +573,10 @@
struct sigevent __user *timer_event_spec,
timer_t __user * created_timer_id);
asmlinkage long sys_timer_gettime(timer_t timer_id,
- struct itimerspec __user *setting);
+ struct __kernel_itimerspec __user *setting);
asmlinkage long sys_timer_getoverrun(timer_t timer_id);
asmlinkage long sys_timer_settime(timer_t timer_id, int flags,
- const struct itimerspec __user *new_setting,
+ const struct __kernel_itimerspec __user *new_setting,
struct itimerspec __user *old_setting);
asmlinkage long sys_timer_delete(timer_t timer_id);
asmlinkage long sys_clock_settime(clockid_t which_clock,
diff --git a/include/linux/time.h b/include/linux/time.h
index aed7446..27d83fd 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -14,9 +14,9 @@
int put_timespec64(const struct timespec64 *ts,
struct __kernel_timespec __user *uts);
int get_itimerspec64(struct itimerspec64 *it,
- const struct itimerspec __user *uit);
+ const struct __kernel_itimerspec __user *uit);
int put_itimerspec64(const struct itimerspec64 *it,
- struct itimerspec __user *uit);
+ struct __kernel_itimerspec __user *uit);
extern time64_t mktime64(const unsigned int year, const unsigned int mon,
const unsigned int day, const unsigned int hour,
diff --git a/include/linux/time64.h b/include/linux/time64.h
index 0a7b2f7..05634af 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -12,6 +12,7 @@
*/
#ifndef CONFIG_64BIT_TIME
#define __kernel_timespec timespec
+#define __kernel_itimerspec itimerspec
#endif
#include <uapi/linux/time.h>
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 86bc202..e798614 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -177,7 +177,7 @@
extern bool timekeeping_rtc_skipsuspend(void);
extern bool timekeeping_rtc_skipresume(void);
-extern void timekeeping_inject_sleeptime64(struct timespec64 *delta);
+extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
/*
* struct system_time_snapshot - simultaneous raw/real time capture with
@@ -243,7 +243,8 @@
extern int persistent_clock_is_local;
extern void read_persistent_clock64(struct timespec64 *ts);
-extern void read_boot_clock64(struct timespec64 *ts);
+void read_persistent_clock_and_boot_offset(struct timespec64 *wall_clock,
+ struct timespec64 *boot_offset);
extern int update_persistent_clock64(struct timespec64 now);
/*
diff --git a/include/linux/torture.h b/include/linux/torture.h
index 6627286..61dfd93 100644
--- a/include/linux/torture.h
+++ b/include/linux/torture.h
@@ -64,6 +64,8 @@
long trs_count;
};
#define DEFINE_TORTURE_RANDOM(name) struct torture_random_state name = { 0, 0 }
+#define DEFINE_TORTURE_RANDOM_PERCPU(name) \
+ DEFINE_PER_CPU(struct torture_random_state, name)
unsigned long torture_random(struct torture_random_state *trsp);
/* Task shuffler, which causes CPUs to occasionally go idle. */
@@ -79,7 +81,7 @@
int torture_stutter_init(int s);
/* Initialization and cleanup. */
-bool torture_init_begin(char *ttype, bool v);
+bool torture_init_begin(char *ttype, int v);
void torture_init_end(void);
bool torture_cleanup_begin(void);
void torture_cleanup_end(void);
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 5936aac..a8d07fe 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -52,6 +52,7 @@
* "cpuqs": CPU passes through a quiescent state.
* "cpuonl": CPU comes online.
* "cpuofl": CPU goes offline.
+ * "cpuofl-bgp": CPU goes offline while blocking a grace period.
* "reqwait": GP kthread sleeps waiting for grace-period request.
* "reqwaitsig": GP kthread awakened by signal from reqwait state.
* "fqswait": GP kthread waiting until time to force quiescent states.
@@ -63,24 +64,24 @@
*/
TRACE_EVENT(rcu_grace_period,
- TP_PROTO(const char *rcuname, unsigned long gpnum, const char *gpevent),
+ TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
- TP_ARGS(rcuname, gpnum, gpevent),
+ TP_ARGS(rcuname, gp_seq, gpevent),
TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpnum)
+ __field(unsigned long, gp_seq)
__field(const char *, gpevent)
),
TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpnum = gpnum;
+ __entry->gp_seq = gp_seq;
__entry->gpevent = gpevent;
),
TP_printk("%s %lu %s",
- __entry->rcuname, __entry->gpnum, __entry->gpevent)
+ __entry->rcuname, __entry->gp_seq, __entry->gpevent)
);
/*
@@ -90,8 +91,8 @@
*
* "Startleaf": Request a grace period based on leaf-node data.
* "Prestarted": Someone beat us to the request
- * "Startedleaf": Leaf-node start proved sufficient.
- * "Startedleafroot": Leaf-node start proved sufficient after checking root.
+ * "Startedleaf": Leaf node marked for future GP.
+ * "Startedleafroot": All nodes from leaf to root marked for future GP.
* "Startedroot": Requested a nocb grace period based on root-node data.
* "NoGPkthread": The RCU grace-period kthread has not yet started.
* "StartWait": Start waiting for the requested grace period.
@@ -102,17 +103,16 @@
*/
TRACE_EVENT(rcu_future_grace_period,
- TP_PROTO(const char *rcuname, unsigned long gpnum, unsigned long completed,
- unsigned long c, u8 level, int grplo, int grphi,
+ TP_PROTO(const char *rcuname, unsigned long gp_seq,
+ unsigned long gp_seq_req, u8 level, int grplo, int grphi,
const char *gpevent),
- TP_ARGS(rcuname, gpnum, completed, c, level, grplo, grphi, gpevent),
+ TP_ARGS(rcuname, gp_seq, gp_seq_req, level, grplo, grphi, gpevent),
TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpnum)
- __field(unsigned long, completed)
- __field(unsigned long, c)
+ __field(unsigned long, gp_seq)
+ __field(unsigned long, gp_seq_req)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
@@ -121,19 +121,17 @@
TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpnum = gpnum;
- __entry->completed = completed;
- __entry->c = c;
+ __entry->gp_seq = gp_seq;
+ __entry->gp_seq_req = gp_seq_req;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
__entry->gpevent = gpevent;
),
- TP_printk("%s %lu %lu %lu %u %d %d %s",
- __entry->rcuname, __entry->gpnum, __entry->completed,
- __entry->c, __entry->level, __entry->grplo, __entry->grphi,
- __entry->gpevent)
+ TP_printk("%s %lu %lu %u %d %d %s",
+ __entry->rcuname, __entry->gp_seq, __entry->gp_seq_req, __entry->level,
+ __entry->grplo, __entry->grphi, __entry->gpevent)
);
/*
@@ -145,14 +143,14 @@
*/
TRACE_EVENT(rcu_grace_period_init,
- TP_PROTO(const char *rcuname, unsigned long gpnum, u8 level,
+ TP_PROTO(const char *rcuname, unsigned long gp_seq, u8 level,
int grplo, int grphi, unsigned long qsmask),
- TP_ARGS(rcuname, gpnum, level, grplo, grphi, qsmask),
+ TP_ARGS(rcuname, gp_seq, level, grplo, grphi, qsmask),
TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpnum)
+ __field(unsigned long, gp_seq)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
@@ -161,7 +159,7 @@
TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpnum = gpnum;
+ __entry->gp_seq = gp_seq;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
@@ -169,7 +167,7 @@
),
TP_printk("%s %lu %u %d %d %lx",
- __entry->rcuname, __entry->gpnum, __entry->level,
+ __entry->rcuname, __entry->gp_seq, __entry->level,
__entry->grplo, __entry->grphi, __entry->qsmask)
);
@@ -301,24 +299,24 @@
*/
TRACE_EVENT(rcu_preempt_task,
- TP_PROTO(const char *rcuname, int pid, unsigned long gpnum),
+ TP_PROTO(const char *rcuname, int pid, unsigned long gp_seq),
- TP_ARGS(rcuname, pid, gpnum),
+ TP_ARGS(rcuname, pid, gp_seq),
TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpnum)
+ __field(unsigned long, gp_seq)
__field(int, pid)
),
TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpnum = gpnum;
+ __entry->gp_seq = gp_seq;
__entry->pid = pid;
),
TP_printk("%s %lu %d",
- __entry->rcuname, __entry->gpnum, __entry->pid)
+ __entry->rcuname, __entry->gp_seq, __entry->pid)
);
/*
@@ -328,23 +326,23 @@
*/
TRACE_EVENT(rcu_unlock_preempted_task,
- TP_PROTO(const char *rcuname, unsigned long gpnum, int pid),
+ TP_PROTO(const char *rcuname, unsigned long gp_seq, int pid),
- TP_ARGS(rcuname, gpnum, pid),
+ TP_ARGS(rcuname, gp_seq, pid),
TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpnum)
+ __field(unsigned long, gp_seq)
__field(int, pid)
),
TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpnum = gpnum;
+ __entry->gp_seq = gp_seq;
__entry->pid = pid;
),
- TP_printk("%s %lu %d", __entry->rcuname, __entry->gpnum, __entry->pid)
+ TP_printk("%s %lu %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
);
/*
@@ -357,15 +355,15 @@
*/
TRACE_EVENT(rcu_quiescent_state_report,
- TP_PROTO(const char *rcuname, unsigned long gpnum,
+ TP_PROTO(const char *rcuname, unsigned long gp_seq,
unsigned long mask, unsigned long qsmask,
u8 level, int grplo, int grphi, int gp_tasks),
- TP_ARGS(rcuname, gpnum, mask, qsmask, level, grplo, grphi, gp_tasks),
+ TP_ARGS(rcuname, gp_seq, mask, qsmask, level, grplo, grphi, gp_tasks),
TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpnum)
+ __field(unsigned long, gp_seq)
__field(unsigned long, mask)
__field(unsigned long, qsmask)
__field(u8, level)
@@ -376,7 +374,7 @@
TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpnum = gpnum;
+ __entry->gp_seq = gp_seq;
__entry->mask = mask;
__entry->qsmask = qsmask;
__entry->level = level;
@@ -386,41 +384,41 @@
),
TP_printk("%s %lu %lx>%lx %u %d %d %u",
- __entry->rcuname, __entry->gpnum,
+ __entry->rcuname, __entry->gp_seq,
__entry->mask, __entry->qsmask, __entry->level,
__entry->grplo, __entry->grphi, __entry->gp_tasks)
);
/*
* Tracepoint for quiescent states detected by force_quiescent_state().
- * These trace events include the type of RCU, the grace-period number that
- * was blocked by the CPU, the CPU itself, and the type of quiescent state,
- * which can be "dti" for dyntick-idle mode, "ofl" for CPU offline, "kick"
- * when kicking a CPU that has been in dyntick-idle mode for too long, or
- * "rqc" if the CPU got a quiescent state via its rcu_qs_ctr.
+ * These trace events include the type of RCU, the grace-period number
+ * that was blocked by the CPU, the CPU itself, and the type of quiescent
+ * state, which can be "dti" for dyntick-idle mode, "kick" when kicking
+ * a CPU that has been in dyntick-idle mode for too long, or "rqc" if the
+ * CPU got a quiescent state via its rcu_qs_ctr.
*/
TRACE_EVENT(rcu_fqs,
- TP_PROTO(const char *rcuname, unsigned long gpnum, int cpu, const char *qsevent),
+ TP_PROTO(const char *rcuname, unsigned long gp_seq, int cpu, const char *qsevent),
- TP_ARGS(rcuname, gpnum, cpu, qsevent),
+ TP_ARGS(rcuname, gp_seq, cpu, qsevent),
TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpnum)
+ __field(unsigned long, gp_seq)
__field(int, cpu)
__field(const char *, qsevent)
),
TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpnum = gpnum;
+ __entry->gp_seq = gp_seq;
__entry->cpu = cpu;
__entry->qsevent = qsevent;
),
TP_printk("%s %lu %d %s",
- __entry->rcuname, __entry->gpnum,
+ __entry->rcuname, __entry->gp_seq,
__entry->cpu, __entry->qsevent)
);
@@ -753,23 +751,23 @@
#else /* #ifdef CONFIG_RCU_TRACE */
-#define trace_rcu_grace_period(rcuname, gpnum, gpevent) do { } while (0)
-#define trace_rcu_future_grace_period(rcuname, gpnum, completed, c, \
+#define trace_rcu_grace_period(rcuname, gp_seq, gpevent) do { } while (0)
+#define trace_rcu_future_grace_period(rcuname, gp_seq, gp_seq_req, \
level, grplo, grphi, event) \
do { } while (0)
-#define trace_rcu_grace_period_init(rcuname, gpnum, level, grplo, grphi, \
+#define trace_rcu_grace_period_init(rcuname, gp_seq, level, grplo, grphi, \
qsmask) do { } while (0)
#define trace_rcu_exp_grace_period(rcuname, gqseq, gpevent) \
do { } while (0)
#define trace_rcu_exp_funnel_lock(rcuname, level, grplo, grphi, gpevent) \
do { } while (0)
#define trace_rcu_nocb_wake(rcuname, cpu, reason) do { } while (0)
-#define trace_rcu_preempt_task(rcuname, pid, gpnum) do { } while (0)
-#define trace_rcu_unlock_preempted_task(rcuname, gpnum, pid) do { } while (0)
-#define trace_rcu_quiescent_state_report(rcuname, gpnum, mask, qsmask, level, \
+#define trace_rcu_preempt_task(rcuname, pid, gp_seq) do { } while (0)
+#define trace_rcu_unlock_preempted_task(rcuname, gp_seq, pid) do { } while (0)
+#define trace_rcu_quiescent_state_report(rcuname, gp_seq, mask, qsmask, level, \
grplo, grphi, gp_tasks) do { } \
while (0)
-#define trace_rcu_fqs(rcuname, gpnum, cpu, qsevent) do { } while (0)
+#define trace_rcu_fqs(rcuname, gp_seq, cpu, qsevent) do { } while (0)
#define trace_rcu_dyntick(polarity, oldnesting, newnesting, dyntick) do { } while (0)
#define trace_rcu_callback(rcuname, rhp, qlen_lazy, qlen) do { } while (0)
#define trace_rcu_kfree_callback(rcuname, rhp, offset, qlen_lazy, qlen) \
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6270a3..b955b98 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -949,6 +949,7 @@
#define KVM_CAP_GET_MSR_FEATURES 153
#define KVM_CAP_HYPERV_EVENTFD 154
#define KVM_CAP_HYPERV_TLBFLUSH 155
+#define KVM_CAP_S390_HPAGE_1M 156
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/include/uapi/linux/time.h b/include/uapi/linux/time.h
index fcf9366..6b56a22 100644
--- a/include/uapi/linux/time.h
+++ b/include/uapi/linux/time.h
@@ -49,6 +49,13 @@
};
#endif
+#ifndef __kernel_itimerspec
+struct __kernel_itimerspec {
+ struct __kernel_timespec it_interval; /* timer period */
+ struct __kernel_timespec it_value; /* timer expiration */
+};
+#endif
+
/*
* legacy timeval structure, only embedded in structures that
* traditionally used 'timeval' to pass time intervals (not absolute
diff --git a/init/Kconfig b/init/Kconfig
index 041f3a0..8b1ab81 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -125,10 +125,13 @@
config HAVE_KERNEL_LZ4
bool
+config HAVE_KERNEL_UNCOMPRESSED
+ bool
+
choice
prompt "Kernel compression mode"
default KERNEL_GZIP
- depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4
+ depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4 || HAVE_KERNEL_UNCOMPRESSED
help
The linux kernel is a kind of self-extracting executable.
Several compression algorithms are available, which differ
@@ -207,6 +210,16 @@
is about 8% bigger than LZO. But the decompression speed is
faster than LZO.
+config KERNEL_UNCOMPRESSED
+ bool "None"
+ depends on HAVE_KERNEL_UNCOMPRESSED
+ help
+ Produce uncompressed kernel image. This option is usually not what
+ you want. It is useful for debugging the kernel in slow simulation
+ environments, where decompressing and moving the kernel is awfully
+ slow. This option allows early boot code to skip the decompressor
+ and jump right at uncompressed kernel image.
+
endchoice
config DEFAULT_HOSTNAME
diff --git a/init/main.c b/init/main.c
index 5e13c54..38c68b5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -79,7 +79,7 @@
#include <linux/pti.h>
#include <linux/blkdev.h>
#include <linux/elevator.h>
-#include <linux/sched_clock.h>
+#include <linux/sched/clock.h>
#include <linux/sched/task.h>
#include <linux/sched/task_stack.h>
#include <linux/context_tracking.h>
@@ -642,7 +642,6 @@
softirq_init();
timekeeping_init();
time_init();
- sched_clock_postinit();
printk_safe_init();
perf_event_init();
profile_init();
@@ -697,6 +696,7 @@
acpi_early_init();
if (late_time_init)
late_time_init();
+ sched_clock_init();
calibrate_delay();
pid_idr_init();
anon_vma_init();
@@ -1065,6 +1065,13 @@
jump_label_invalidate_initmem();
free_initmem();
mark_readonly();
+
+ /*
+ * Kernel mappings are now finalized - update the userspace page-table
+ * to finalize PTI.
+ */
+ pti_finalize();
+
system_state = SYSTEM_RUNNING;
numa_default_policy();
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a31a1ba..b41c6cf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -575,7 +575,7 @@
{
int refold;
- refold = __atomic_add_unless(&map->refcnt, 1, 0);
+ refold = atomic_fetch_add_unless(&map->refcnt, 1, 0);
if (refold >= BPF_MAX_REFCNT) {
__bpf_map_put(map, false);
@@ -1144,7 +1144,7 @@
{
int refold;
- refold = __atomic_add_unless(&prog->aux->refcnt, 1, 0);
+ refold = atomic_fetch_add_unless(&prog->aux->refcnt, 1, 0);
if (refold >= BPF_MAX_REFCNT) {
__bpf_prog_put(prog, false);
diff --git a/kernel/compat.c b/kernel/compat.c
index 702aa84..8e40efc 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -324,35 +324,6 @@
return ret;
}
-/* Todo: Delete these extern declarations when get/put_compat_itimerspec64()
- * are moved to kernel/time/time.c .
- */
-extern int __compat_get_timespec64(struct timespec64 *ts64,
- const struct compat_timespec __user *cts);
-extern int __compat_put_timespec64(const struct timespec64 *ts64,
- struct compat_timespec __user *cts);
-
-int get_compat_itimerspec64(struct itimerspec64 *its,
- const struct compat_itimerspec __user *uits)
-{
-
- if (__compat_get_timespec64(&its->it_interval, &uits->it_interval) ||
- __compat_get_timespec64(&its->it_value, &uits->it_value))
- return -EFAULT;
- return 0;
-}
-EXPORT_SYMBOL_GPL(get_compat_itimerspec64);
-
-int put_compat_itimerspec64(const struct itimerspec64 *its,
- struct compat_itimerspec __user *uits)
-{
- if (__compat_put_timespec64(&its->it_interval, &uits->it_interval) ||
- __compat_put_timespec64(&its->it_value, &uits->it_value))
- return -EFAULT;
- return 0;
-}
-EXPORT_SYMBOL_GPL(put_compat_itimerspec64);
-
/*
* We currently only need the following fields from the sigevent
* structure: sigev_value, sigev_signo, sig_notify and (sometimes
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2f8f338..dd8634d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1274,7 +1274,7 @@
* otherwise a RCU stall occurs.
*/
[CPUHP_TIMERS_PREPARE] = {
- .name = "timers:dead",
+ .name = "timers:prepare",
.startup.single = timers_prepare_cpu,
.teardown.single = timers_dead_cpu,
},
@@ -1344,6 +1344,11 @@
.startup.single = perf_event_init_cpu,
.teardown.single = perf_event_exit_cpu,
},
+ [CPUHP_AP_WATCHDOG_ONLINE] = {
+ .name = "lockup_detector:online",
+ .startup.single = lockup_detector_online_cpu,
+ .teardown.single = lockup_detector_offline_cpu,
+ },
[CPUHP_AP_WORKQUEUE_ONLINE] = {
.name = "workqueue:online",
.startup.single = workqueue_online_cpu,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index eec2d5f..f6ea33a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1656,7 +1656,7 @@
typeof(*event), group_node))
/*
- * Add a event from the lists for its context.
+ * Add an event from the lists for its context.
* Must be called with ctx->mutex and ctx->lock held.
*/
static void
@@ -1844,7 +1844,7 @@
}
/*
- * Remove a event from the lists for its context.
+ * Remove an event from the lists for its context.
* Must be called with ctx->mutex and ctx->lock held.
*/
static void
@@ -2148,7 +2148,7 @@
}
/*
- * Disable a event.
+ * Disable an event.
*
* If event->ctx is a cloned context, callers must make sure that
* every task struct that event->ctx->task could possibly point to
@@ -2677,7 +2677,7 @@
}
/*
- * Enable a event.
+ * Enable an event.
*
* If event->ctx is a cloned context, callers must make sure that
* every task struct that event->ctx->task could possibly point to
@@ -2755,7 +2755,7 @@
* events will refuse to restart because of rb::aux_mmap_count==0,
* see comments in perf_aux_output_begin().
*
- * Since this is happening on a event-local CPU, no trace is lost
+ * Since this is happening on an event-local CPU, no trace is lost
* while restarting.
*/
if (sd->restart)
@@ -4827,7 +4827,7 @@
int ret;
/*
- * Return end-of-file for a read on a event that is in
+ * Return end-of-file for a read on an event that is in
* error state (i.e. because it was pinned but it couldn't be
* scheduled on to the CPU at some point).
*/
@@ -5273,11 +5273,11 @@
}
EXPORT_SYMBOL_GPL(perf_event_update_userpage);
-static int perf_mmap_fault(struct vm_fault *vmf)
+static vm_fault_t perf_mmap_fault(struct vm_fault *vmf)
{
struct perf_event *event = vmf->vma->vm_file->private_data;
struct ring_buffer *rb;
- int ret = VM_FAULT_SIGBUS;
+ vm_fault_t ret = VM_FAULT_SIGBUS;
if (vmf->flags & FAULT_FLAG_MKWRITE) {
if (vmf->pgoff == 0)
@@ -9904,7 +9904,7 @@
}
/*
- * Allocate and initialize a event structure
+ * Allocate and initialize an event structure
*/
static struct perf_event *
perf_event_alloc(struct perf_event_attr *attr, int cpu,
@@ -11235,7 +11235,7 @@
}
/*
- * Inherit a event from parent task to child task.
+ * Inherit an event from parent task to child task.
*
* Returns:
* - valid pointer on success
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 6e28d28..b3814fc 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -345,13 +345,13 @@
mutex_unlock(&nr_bp_mutex);
}
-static int __modify_bp_slot(struct perf_event *bp, u64 old_type)
+static int __modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type)
{
int err;
__release_bp_slot(bp, old_type);
- err = __reserve_bp_slot(bp, bp->attr.bp_type);
+ err = __reserve_bp_slot(bp, new_type);
if (err) {
/*
* Reserve the old_type slot back in case
@@ -367,12 +367,12 @@
return err;
}
-static int modify_bp_slot(struct perf_event *bp, u64 old_type)
+static int modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type)
{
int ret;
mutex_lock(&nr_bp_mutex);
- ret = __modify_bp_slot(bp, old_type);
+ ret = __modify_bp_slot(bp, old_type, new_type);
mutex_unlock(&nr_bp_mutex);
return ret;
}
@@ -400,16 +400,18 @@
return 0;
}
-static int validate_hw_breakpoint(struct perf_event *bp)
+static int hw_breakpoint_parse(struct perf_event *bp,
+ const struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
{
- int ret;
+ int err;
- ret = arch_validate_hwbkpt_settings(bp);
- if (ret)
- return ret;
+ err = hw_breakpoint_arch_parse(bp, attr, hw);
+ if (err)
+ return err;
- if (arch_check_bp_in_kernelspace(bp)) {
- if (bp->attr.exclude_kernel)
+ if (arch_check_bp_in_kernelspace(hw)) {
+ if (attr->exclude_kernel)
return -EINVAL;
/*
* Don't let unprivileged users set a breakpoint in the trap
@@ -424,19 +426,22 @@
int register_perf_hw_breakpoint(struct perf_event *bp)
{
- int ret;
+ struct arch_hw_breakpoint hw;
+ int err;
- ret = reserve_bp_slot(bp);
- if (ret)
- return ret;
+ err = reserve_bp_slot(bp);
+ if (err)
+ return err;
- ret = validate_hw_breakpoint(bp);
-
- /* if arch_validate_hwbkpt_settings() fails then release bp slot */
- if (ret)
+ err = hw_breakpoint_parse(bp, &bp->attr, &hw);
+ if (err) {
release_bp_slot(bp);
+ return err;
+ }
- return ret;
+ bp->hw.info = hw;
+
+ return 0;
}
/**
@@ -456,35 +461,44 @@
}
EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
+static void hw_breakpoint_copy_attr(struct perf_event_attr *to,
+ struct perf_event_attr *from)
+{
+ to->bp_addr = from->bp_addr;
+ to->bp_type = from->bp_type;
+ to->bp_len = from->bp_len;
+ to->disabled = from->disabled;
+}
+
int
modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *attr,
bool check)
{
- u64 old_addr = bp->attr.bp_addr;
- u64 old_len = bp->attr.bp_len;
- int old_type = bp->attr.bp_type;
- bool modify = attr->bp_type != old_type;
- int err = 0;
+ struct arch_hw_breakpoint hw;
+ int err;
- bp->attr.bp_addr = attr->bp_addr;
- bp->attr.bp_type = attr->bp_type;
- bp->attr.bp_len = attr->bp_len;
-
- if (check && memcmp(&bp->attr, attr, sizeof(*attr)))
- return -EINVAL;
-
- err = validate_hw_breakpoint(bp);
- if (!err && modify)
- err = modify_bp_slot(bp, old_type);
-
- if (err) {
- bp->attr.bp_addr = old_addr;
- bp->attr.bp_type = old_type;
- bp->attr.bp_len = old_len;
+ err = hw_breakpoint_parse(bp, attr, &hw);
+ if (err)
return err;
+
+ if (check) {
+ struct perf_event_attr old_attr;
+
+ old_attr = bp->attr;
+ hw_breakpoint_copy_attr(&old_attr, attr);
+ if (memcmp(&old_attr, attr, sizeof(*attr)))
+ return -EINVAL;
}
- bp->attr.disabled = attr->disabled;
+ if (bp->attr.bp_type != attr->bp_type) {
+ err = modify_bp_slot(bp, bp->attr.bp_type, attr->bp_type);
+ if (err)
+ return err;
+ }
+
+ hw_breakpoint_copy_attr(&bp->attr, attr);
+ bp->hw.info = hw;
+
return 0;
}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a..aed1ba5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -918,7 +918,7 @@
EXPORT_SYMBOL_GPL(uprobe_register);
/*
- * uprobe_apply - unregister a already registered probe.
+ * uprobe_apply - unregister an already registered probe.
* @inode: the file in which the probe has to be removed.
* @offset: offset from the start of the file.
* @uc: consumer which wants to add more or remove some breakpoints
@@ -947,7 +947,7 @@
}
/*
- * uprobe_unregister - unregister a already registered probe.
+ * uprobe_unregister - unregister an already registered probe.
* @inode: the file in which the probe has to be removed.
* @offset: offset from the start of the file.
* @uc: identify which probe if multiple probes are colocated.
@@ -1403,7 +1403,7 @@
/*
* Called with no locks held.
- * Called in context of a exiting or a exec-ing thread.
+ * Called in context of an exiting or an exec-ing thread.
*/
void uprobe_free_utask(struct task_struct *t)
{
diff --git a/kernel/fail_function.c b/kernel/fail_function.c
index 5349c91..bc80a4e 100644
--- a/kernel/fail_function.c
+++ b/kernel/fail_function.c
@@ -184,9 +184,6 @@
if (should_fail(&fei_fault_attr, 1)) {
regs_set_return_value(regs, attr->retval);
override_function_with_return(regs);
- /* Kprobe specific fixup */
- reset_current_kprobe();
- preempt_enable_no_resched();
return 1;
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 1b27bab..9d8d0e0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2276,6 +2276,8 @@
void __init proc_caches_init(void)
{
+ unsigned int mm_size;
+
sighand_cachep = kmem_cache_create("sighand_cache",
sizeof(struct sighand_struct), 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
@@ -2292,15 +2294,16 @@
sizeof(struct fs_struct), 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
NULL);
+
/*
- * FIXME! The "sizeof(struct mm_struct)" currently includes the
- * whole struct cpumask for the OFFSTACK case. We could change
- * this to *only* allocate as much of it as required by the
- * maximum number of CPU's we can ever have. The cpumask_allocation
- * is at the end of the structure, exactly for that reason.
+ * The mm_cpumask is located at the end of mm_struct, and is
+ * dynamically sized based on the maximum CPU number this system
+ * can have, taking hotplug into account (nr_cpu_ids).
*/
+ mm_size = sizeof(struct mm_struct) + cpumask_size();
+
mm_cachep = kmem_cache_create_usercopy("mm_struct",
- sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
+ mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
offsetof(struct mm_struct, saved_auxv),
sizeof_field(struct mm_struct, saved_auxv),
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index c6766f3..5f3e2ba 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -134,7 +134,6 @@
endmenu
config GENERIC_IRQ_MULTI_HANDLER
- depends on !MULTI_IRQ_HANDLER
bool
help
Allow to specify the low level IRQ handler at run time.
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index afc7f90..578d0e5 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -443,6 +443,7 @@
* We free the descriptor, masks and stat fields via RCU. That
* allows demultiplex interrupts to do rcu based management of
* the child interrupts.
+ * This also allows us to use rcu in kstat_irqs_usr().
*/
call_rcu(&desc->rcu, delayed_free_desc);
}
@@ -928,17 +929,17 @@
* kstat_irqs_usr - Get the statistics for an interrupt
* @irq: The interrupt number
*
- * Returns the sum of interrupt counts on all cpus since boot for
- * @irq. Contrary to kstat_irqs() this can be called from any
- * preemptible context. It's protected against concurrent removal of
- * an interrupt descriptor when sparse irqs are enabled.
+ * Returns the sum of interrupt counts on all cpus since boot for @irq.
+ * Contrary to kstat_irqs() this can be called from any context.
+ * It uses rcu since a concurrent removal of an interrupt descriptor is
+ * observing an rcu grace period before delayed_free_desc()/irq_kobj_release().
*/
unsigned int kstat_irqs_usr(unsigned int irq)
{
unsigned int sum;
- irq_lock_sparse();
+ rcu_read_lock();
sum = kstat_irqs(irq);
- irq_unlock_sparse();
+ rcu_read_unlock();
return sum;
}
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 9a8b7ba..fb86146 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -790,9 +790,19 @@
static int irq_wait_for_interrupt(struct irqaction *action)
{
- set_current_state(TASK_INTERRUPTIBLE);
+ for (;;) {
+ set_current_state(TASK_INTERRUPTIBLE);
- while (!kthread_should_stop()) {
+ if (kthread_should_stop()) {
+ /* may need to run one last time */
+ if (test_and_clear_bit(IRQTF_RUNTHREAD,
+ &action->thread_flags)) {
+ __set_current_state(TASK_RUNNING);
+ return 0;
+ }
+ __set_current_state(TASK_RUNNING);
+ return -1;
+ }
if (test_and_clear_bit(IRQTF_RUNTHREAD,
&action->thread_flags)) {
@@ -800,10 +810,7 @@
return 0;
}
schedule();
- set_current_state(TASK_INTERRUPTIBLE);
}
- __set_current_state(TASK_RUNNING);
- return -1;
}
/*
@@ -1024,11 +1031,8 @@
/*
* This is the regular exit path. __free_irq() is stopping the
* thread via kthread_stop() after calling
- * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
- * oneshot mask bit can be set. We cannot verify that as we
- * cannot touch the oneshot mask at this point anymore as
- * __setup_irq() might have given out currents thread_mask
- * again.
+ * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
+ * oneshot mask bit can be set.
*/
task_work_cancel(current, irq_thread_dtor);
return 0;
@@ -1251,8 +1255,10 @@
/*
* Protects against a concurrent __free_irq() call which might wait
- * for synchronize_irq() to complete without holding the optional
- * chip bus lock and desc->lock.
+ * for synchronize_hardirq() to complete without holding the optional
+ * chip bus lock and desc->lock. Also protects against handing out
+ * a recycled oneshot thread_mask bit while it's still in use by
+ * its previous owner.
*/
mutex_lock(&desc->request_mutex);
@@ -1571,9 +1577,6 @@
WARN(in_interrupt(), "Trying to free IRQ %d from IRQ context!\n", irq);
- if (!desc)
- return NULL;
-
mutex_lock(&desc->request_mutex);
chip_bus_lock(desc);
raw_spin_lock_irqsave(&desc->lock, flags);
@@ -1620,11 +1623,11 @@
/*
* Drop bus_lock here so the changes which were done in the chip
* callbacks above are synced out to the irq chips which hang
- * behind a slow bus (I2C, SPI) before calling synchronize_irq().
+ * behind a slow bus (I2C, SPI) before calling synchronize_hardirq().
*
* Aside of that the bus_lock can also be taken from the threaded
* handler in irq_finalize_oneshot() which results in a deadlock
- * because synchronize_irq() would wait forever for the thread to
+ * because kthread_stop() would wait forever for the thread to
* complete, which is blocked on the bus lock.
*
* The still held desc->request_mutex() protects against a
@@ -1636,7 +1639,7 @@
unregister_handler_proc(irq, action);
/* Make sure it's not being used on another CPU: */
- synchronize_irq(irq);
+ synchronize_hardirq(irq);
#ifdef CONFIG_DEBUG_SHIRQ
/*
@@ -1645,7 +1648,7 @@
* is so by doing an extra call to the handler ....
*
* ( We do this after actually deregistering it, to make sure that a
- * 'real' IRQ doesn't run in * parallel with our fake. )
+ * 'real' IRQ doesn't run in parallel with our fake. )
*/
if (action->flags & IRQF_SHARED) {
local_irq_save(flags);
@@ -1654,6 +1657,12 @@
}
#endif
+ /*
+ * The action has already been removed above, but the thread writes
+ * its oneshot mask bit when it completes. Though request_mutex is
+ * held across this which prevents __setup_irq() from handing out
+ * the same bit to a newly requested action.
+ */
if (action->thread) {
kthread_stop(action->thread);
put_task_struct(action->thread);
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 37eda10..da9addb 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -475,22 +475,24 @@
seq_putc(p, '\n');
}
- irq_lock_sparse();
+ rcu_read_lock();
desc = irq_to_desc(i);
if (!desc)
goto outsparse;
- raw_spin_lock_irqsave(&desc->lock, flags);
- for_each_online_cpu(j)
- any_count |= kstat_irqs_cpu(i, j);
- action = desc->action;
- if ((!action || irq_desc_is_chained(desc)) && !any_count)
- goto out;
+ if (desc->kstat_irqs)
+ for_each_online_cpu(j)
+ any_count |= *per_cpu_ptr(desc->kstat_irqs, j);
+
+ if ((!desc->action || irq_desc_is_chained(desc)) && !any_count)
+ goto outsparse;
seq_printf(p, "%*d: ", prec, i);
for_each_online_cpu(j)
- seq_printf(p, "%10u ", kstat_irqs_cpu(i, j));
+ seq_printf(p, "%10u ", desc->kstat_irqs ?
+ *per_cpu_ptr(desc->kstat_irqs, j) : 0);
+ raw_spin_lock_irqsave(&desc->lock, flags);
if (desc->irq_data.chip) {
if (desc->irq_data.chip->irq_print_chip)
desc->irq_data.chip->irq_print_chip(&desc->irq_data, p);
@@ -511,6 +513,7 @@
if (desc->name)
seq_printf(p, "-%-8s", desc->name);
+ action = desc->action;
if (action) {
seq_printf(p, " %s", action->name);
while ((action = action->next) != NULL)
@@ -518,10 +521,9 @@
}
seq_putc(p, '\n');
-out:
raw_spin_unlock_irqrestore(&desc->lock, flags);
outsparse:
- irq_unlock_sparse();
+ rcu_read_unlock();
return 0;
}
#endif
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ea61902..ab257be 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -627,8 +627,8 @@
(kprobe_disabled(p) || kprobes_all_disarmed))
return;
- /* Both of break_handler and post_handler are not supported. */
- if (p->break_handler || p->post_handler)
+ /* kprobes with post_handler can not be optimized */
+ if (p->post_handler)
return;
op = container_of(p, struct optimized_kprobe, kp);
@@ -710,9 +710,7 @@
* there is still a relative jump) and disabled.
*/
op = container_of(ap, struct optimized_kprobe, kp);
- if (unlikely(list_empty(&op->list)))
- printk(KERN_WARNING "Warning: found a stray unused "
- "aggrprobe@%p\n", ap->addr);
+ WARN_ON_ONCE(list_empty(&op->list));
/* Enable the probe again */
ap->flags &= ~KPROBE_FLAG_DISABLED;
/* Optimize it again (remove from op->list) */
@@ -985,7 +983,8 @@
ret = ftrace_set_filter_ip(&kprobe_ftrace_ops,
(unsigned long)p->addr, 0, 0);
if (ret) {
- pr_debug("Failed to arm kprobe-ftrace at %p (%d)\n", p->addr, ret);
+ pr_debug("Failed to arm kprobe-ftrace at %pS (%d)\n",
+ p->addr, ret);
return ret;
}
@@ -1025,7 +1024,8 @@
ret = ftrace_set_filter_ip(&kprobe_ftrace_ops,
(unsigned long)p->addr, 1, 0);
- WARN(ret < 0, "Failed to disarm kprobe-ftrace at %p (%d)\n", p->addr, ret);
+ WARN_ONCE(ret < 0, "Failed to disarm kprobe-ftrace at %pS (%d)\n",
+ p->addr, ret);
return ret;
}
#else /* !CONFIG_KPROBES_ON_FTRACE */
@@ -1116,20 +1116,6 @@
}
NOKPROBE_SYMBOL(aggr_fault_handler);
-static int aggr_break_handler(struct kprobe *p, struct pt_regs *regs)
-{
- struct kprobe *cur = __this_cpu_read(kprobe_instance);
- int ret = 0;
-
- if (cur && cur->break_handler) {
- if (cur->break_handler(cur, regs))
- ret = 1;
- }
- reset_kprobe_instance();
- return ret;
-}
-NOKPROBE_SYMBOL(aggr_break_handler);
-
/* Walks the list and increments nmissed count for multiprobe case */
void kprobes_inc_nmissed_count(struct kprobe *p)
{
@@ -1270,24 +1256,15 @@
}
NOKPROBE_SYMBOL(cleanup_rp_inst);
-/*
-* Add the new probe to ap->list. Fail if this is the
-* second jprobe at the address - two jprobes can't coexist
-*/
+/* Add the new probe to ap->list */
static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
{
BUG_ON(kprobe_gone(ap) || kprobe_gone(p));
- if (p->break_handler || p->post_handler)
+ if (p->post_handler)
unoptimize_kprobe(ap, true); /* Fall back to normal kprobe */
- if (p->break_handler) {
- if (ap->break_handler)
- return -EEXIST;
- list_add_tail_rcu(&p->list, &ap->list);
- ap->break_handler = aggr_break_handler;
- } else
- list_add_rcu(&p->list, &ap->list);
+ list_add_rcu(&p->list, &ap->list);
if (p->post_handler && !ap->post_handler)
ap->post_handler = aggr_post_handler;
@@ -1310,8 +1287,6 @@
/* We don't care the kprobe which has gone. */
if (p->post_handler && !kprobe_gone(p))
ap->post_handler = aggr_post_handler;
- if (p->break_handler && !kprobe_gone(p))
- ap->break_handler = aggr_break_handler;
INIT_LIST_HEAD(&ap->list);
INIT_HLIST_NODE(&ap->hlist);
@@ -1706,8 +1681,6 @@
goto disarmed;
else {
/* If disabling probe has special handlers, update aggrprobe */
- if (p->break_handler && !kprobe_gone(p))
- ap->break_handler = NULL;
if (p->post_handler && !kprobe_gone(p)) {
list_for_each_entry_rcu(list_p, &ap->list, list) {
if ((list_p != p) && (list_p->post_handler))
@@ -1812,77 +1785,6 @@
return (unsigned long)entry;
}
-#if 0
-int register_jprobes(struct jprobe **jps, int num)
-{
- int ret = 0, i;
-
- if (num <= 0)
- return -EINVAL;
-
- for (i = 0; i < num; i++) {
- ret = register_jprobe(jps[i]);
-
- if (ret < 0) {
- if (i > 0)
- unregister_jprobes(jps, i);
- break;
- }
- }
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(register_jprobes);
-
-int register_jprobe(struct jprobe *jp)
-{
- unsigned long addr, offset;
- struct kprobe *kp = &jp->kp;
-
- /*
- * Verify probepoint as well as the jprobe handler are
- * valid function entry points.
- */
- addr = arch_deref_entry_point(jp->entry);
-
- if (kallsyms_lookup_size_offset(addr, NULL, &offset) && offset == 0 &&
- kprobe_on_func_entry(kp->addr, kp->symbol_name, kp->offset)) {
- kp->pre_handler = setjmp_pre_handler;
- kp->break_handler = longjmp_break_handler;
- return register_kprobe(kp);
- }
-
- return -EINVAL;
-}
-EXPORT_SYMBOL_GPL(register_jprobe);
-
-void unregister_jprobe(struct jprobe *jp)
-{
- unregister_jprobes(&jp, 1);
-}
-EXPORT_SYMBOL_GPL(unregister_jprobe);
-
-void unregister_jprobes(struct jprobe **jps, int num)
-{
- int i;
-
- if (num <= 0)
- return;
- mutex_lock(&kprobe_mutex);
- for (i = 0; i < num; i++)
- if (__unregister_kprobe_top(&jps[i]->kp) < 0)
- jps[i]->kp.addr = NULL;
- mutex_unlock(&kprobe_mutex);
-
- synchronize_sched();
- for (i = 0; i < num; i++) {
- if (jps[i]->kp.addr)
- __unregister_kprobe_bottom(&jps[i]->kp);
- }
-}
-EXPORT_SYMBOL_GPL(unregister_jprobes);
-#endif
-
#ifdef CONFIG_KRETPROBES
/*
* This kprobe pre_handler is registered with every kretprobe. When probe
@@ -1982,7 +1884,6 @@
rp->kp.pre_handler = pre_handler_kretprobe;
rp->kp.post_handler = NULL;
rp->kp.fault_handler = NULL;
- rp->kp.break_handler = NULL;
/* Pre-allocate memory for max kretprobe instances */
if (rp->maxactive <= 0) {
@@ -2105,7 +2006,6 @@
list_for_each_entry_rcu(kp, &p->list, list)
kp->flags |= KPROBE_FLAG_GONE;
p->post_handler = NULL;
- p->break_handler = NULL;
kill_optimized_kprobe(p);
}
/*
@@ -2169,11 +2069,12 @@
}
EXPORT_SYMBOL_GPL(enable_kprobe);
+/* Caller must NOT call this in usual path. This is only for critical case */
void dump_kprobe(struct kprobe *kp)
{
- printk(KERN_WARNING "Dumping kprobe:\n");
- printk(KERN_WARNING "Name: %s\nAddress: %p\nOffset: %x\n",
- kp->symbol_name, kp->addr, kp->offset);
+ pr_err("Dumping kprobe:\n");
+ pr_err("Name: %s\nOffset: %x\nAddress: %pS\n",
+ kp->symbol_name, kp->offset, kp->addr);
}
NOKPROBE_SYMBOL(dump_kprobe);
@@ -2196,11 +2097,8 @@
entry = arch_deref_entry_point((void *)*iter);
if (!kernel_text_address(entry) ||
- !kallsyms_lookup_size_offset(entry, &size, &offset)) {
- pr_err("Failed to find blacklist at %p\n",
- (void *)entry);
+ !kallsyms_lookup_size_offset(entry, &size, &offset))
continue;
- }
ent = kmalloc(sizeof(*ent), GFP_KERNEL);
if (!ent)
@@ -2326,21 +2224,23 @@
const char *sym, int offset, char *modname, struct kprobe *pp)
{
char *kprobe_type;
+ void *addr = p->addr;
if (p->pre_handler == pre_handler_kretprobe)
kprobe_type = "r";
- else if (p->pre_handler == setjmp_pre_handler)
- kprobe_type = "j";
else
kprobe_type = "k";
+ if (!kallsyms_show_value())
+ addr = NULL;
+
if (sym)
- seq_printf(pi, "%p %s %s+0x%x %s ",
- p->addr, kprobe_type, sym, offset,
+ seq_printf(pi, "%px %s %s+0x%x %s ",
+ addr, kprobe_type, sym, offset,
(modname ? modname : " "));
- else
- seq_printf(pi, "%p %s %p ",
- p->addr, kprobe_type, p->addr);
+ else /* try to use %pS */
+ seq_printf(pi, "%px %s %pS ",
+ addr, kprobe_type, p->addr);
if (!pp)
pp = p;
@@ -2428,8 +2328,16 @@
struct kprobe_blacklist_entry *ent =
list_entry(v, struct kprobe_blacklist_entry, list);
- seq_printf(m, "0x%px-0x%px\t%ps\n", (void *)ent->start_addr,
- (void *)ent->end_addr, (void *)ent->start_addr);
+ /*
+ * If /proc/kallsyms is not showing kernel address, we won't
+ * show them here either.
+ */
+ if (!kallsyms_show_value())
+ seq_printf(m, "0x%px-0x%px\t%ps\n", NULL, NULL,
+ (void *)ent->start_addr);
+ else
+ seq_printf(m, "0x%px-0x%px\t%ps\n", (void *)ent->start_addr,
+ (void *)ent->end_addr, (void *)ent->start_addr);
return 0;
}
@@ -2611,7 +2519,7 @@
if (!dir)
return -ENOMEM;
- file = debugfs_create_file("list", 0444, dir, NULL,
+ file = debugfs_create_file("list", 0400, dir, NULL,
&debugfs_kprobes_operations);
if (!file)
goto error;
@@ -2621,7 +2529,7 @@
if (!file)
goto error;
- file = debugfs_create_file("blacklist", 0444, dir, NULL,
+ file = debugfs_create_file("blacklist", 0400, dir, NULL,
&debugfs_kprobe_blacklist_ops);
if (!file)
goto error;
@@ -2637,6 +2545,3 @@
#endif /* CONFIG_DEBUG_FS */
module_init(init_kprobes);
-
-/* defined in arch/.../kernel/kprobes.c */
-EXPORT_SYMBOL_GPL(jprobe_return);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 486dedb..087d18d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -190,7 +190,7 @@
if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags))
break;
- complete_all(&self->parked);
+ complete(&self->parked);
schedule();
}
__set_current_state(TASK_RUNNING);
@@ -471,7 +471,6 @@
if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
__kthread_bind(k, kthread->cpu, TASK_PARKED);
- reinit_completion(&kthread->parked);
clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
/*
* __kthread_parkme() will either see !SHOULD_PARK or get the wakeup.
@@ -499,6 +498,9 @@
if (WARN_ON(k->flags & PF_EXITING))
return -ENOSYS;
+ if (WARN_ON_ONCE(test_bit(KTHREAD_SHOULD_PARK, &kthread->flags)))
+ return -EBUSY;
+
set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
if (k != current) {
wake_up_process(k);
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 8402b33..57bef4f 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -21,6 +21,9 @@
* Davidlohr Bueso <dave@stgolabs.net>
* Based on kernel/rcu/torture.c.
*/
+
+#define pr_fmt(fmt) fmt
+
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kthread.h>
@@ -57,7 +60,7 @@
torture_param(int, stat_interval, 60,
"Number of seconds between stats printk()s");
torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
-torture_param(bool, verbose, true,
+torture_param(int, verbose, 1,
"Enable verbose debugging printk()s");
static char *torture_type = "spin_lock";
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 8733156..70178f6 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -92,7 +92,7 @@
/* Push all the CPUs into the idle loop. */
wake_up_all_idle_cpus();
/* Make the current CPU wait so it can enter the idle loop too. */
- swait_event(s2idle_wait_head,
+ swait_event_exclusive(s2idle_wait_head,
s2idle_state == S2IDLE_STATE_WAKE);
cpuidle_pause();
@@ -160,7 +160,7 @@
raw_spin_lock_irqsave(&s2idle_lock, flags);
if (s2idle_state > S2IDLE_STATE_NONE) {
s2idle_state = S2IDLE_STATE_WAKE;
- swake_up(&s2idle_wait_head);
+ swake_up_one(&s2idle_wait_head);
}
raw_spin_unlock_irqrestore(&s2idle_lock, flags);
}
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 40cea67..4d04683 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -91,7 +91,17 @@
WRITE_ONCE(*sp, rcu_seq_endval(sp));
}
-/* Take a snapshot of the update side's sequence number. */
+/*
+ * rcu_seq_snap - Take a snapshot of the update side's sequence number.
+ *
+ * This function returns the earliest value of the grace-period sequence number
+ * that will indicate that a full grace period has elapsed since the current
+ * time. Once the grace-period sequence number has reached this value, it will
+ * be safe to invoke all callbacks that have been registered prior to the
+ * current time. This value is the current grace-period number plus two to the
+ * power of the number of low-order bits reserved for state, then rounded up to
+ * the next value in which the state bits are all zero.
+ */
static inline unsigned long rcu_seq_snap(unsigned long *sp)
{
unsigned long s;
@@ -108,6 +118,15 @@
}
/*
+ * Given a snapshot from rcu_seq_snap(), determine whether or not the
+ * corresponding update-side operation has started.
+ */
+static inline bool rcu_seq_started(unsigned long *sp, unsigned long s)
+{
+ return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp));
+}
+
+/*
* Given a snapshot from rcu_seq_snap(), determine whether or not a
* full update-side operation has occurred.
*/
@@ -117,6 +136,45 @@
}
/*
+ * Has a grace period completed since the time the old gp_seq was collected?
+ */
+static inline bool rcu_seq_completed_gp(unsigned long old, unsigned long new)
+{
+ return ULONG_CMP_LT(old, new & ~RCU_SEQ_STATE_MASK);
+}
+
+/*
+ * Has a grace period started since the time the old gp_seq was collected?
+ */
+static inline bool rcu_seq_new_gp(unsigned long old, unsigned long new)
+{
+ return ULONG_CMP_LT((old + RCU_SEQ_STATE_MASK) & ~RCU_SEQ_STATE_MASK,
+ new);
+}
+
+/*
+ * Roughly how many full grace periods have elapsed between the collection
+ * of the two specified grace periods?
+ */
+static inline unsigned long rcu_seq_diff(unsigned long new, unsigned long old)
+{
+ unsigned long rnd_diff;
+
+ if (old == new)
+ return 0;
+ /*
+ * Compute the number of grace periods (still shifted up), plus
+ * one if either of new and old is not an exact grace period.
+ */
+ rnd_diff = (new & ~RCU_SEQ_STATE_MASK) -
+ ((old + RCU_SEQ_STATE_MASK) & ~RCU_SEQ_STATE_MASK) +
+ ((new & RCU_SEQ_STATE_MASK) || (old & RCU_SEQ_STATE_MASK));
+ if (ULONG_CMP_GE(RCU_SEQ_STATE_MASK, rnd_diff))
+ return 1; /* Definitely no grace period has elapsed. */
+ return ((rnd_diff - RCU_SEQ_STATE_MASK - 1) >> RCU_SEQ_CTR_SHIFT) + 2;
+}
+
+/*
* debug_rcu_head_queue()/debug_rcu_head_unqueue() are used internally
* by call_rcu() and rcu callback execution, and are therefore not part of the
* RCU API. Leaving in rcupdate.h because they are used by all RCU flavors.
@@ -276,6 +334,9 @@
/* Is this rcu_node a leaf? */
#define rcu_is_leaf_node(rnp) ((rnp)->level == rcu_num_lvls - 1)
+/* Is this rcu_node the last leaf? */
+#define rcu_is_last_leaf_node(rsp, rnp) ((rnp) == &(rsp)->node[rcu_num_nodes - 1])
+
/*
* Do a full breadth-first scan of the rcu_node structures for the
* specified rcu_state structure.
@@ -405,8 +466,7 @@
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
- unsigned long *gpnum, unsigned long *completed);
-void rcutorture_record_test_transition(void);
+ unsigned long *gp_seq);
void rcutorture_record_progress(unsigned long vernum);
void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
@@ -415,15 +475,11 @@
unsigned long c);
#else
static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
- int *flags,
- unsigned long *gpnum,
- unsigned long *completed)
+ int *flags, unsigned long *gp_seq)
{
*flags = 0;
- *gpnum = 0;
- *completed = 0;
+ *gp_seq = 0;
}
-static inline void rcutorture_record_test_transition(void) { }
static inline void rcutorture_record_progress(unsigned long vernum) { }
#ifdef CONFIG_RCU_TRACE
void do_trace_rcu_torture_read(const char *rcutorturename,
@@ -441,31 +497,26 @@
static inline void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
- unsigned long *gpnum,
- unsigned long *completed)
+ unsigned long *gp_seq)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
- *completed = sp->srcu_idx;
- *gpnum = *completed;
+ *gp_seq = sp->srcu_idx;
}
#elif defined(CONFIG_TREE_SRCU)
void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
- unsigned long *gpnum, unsigned long *completed);
+ unsigned long *gp_seq);
#endif
#ifdef CONFIG_TINY_RCU
-static inline unsigned long rcu_batches_started(void) { return 0; }
-static inline unsigned long rcu_batches_started_bh(void) { return 0; }
-static inline unsigned long rcu_batches_started_sched(void) { return 0; }
-static inline unsigned long rcu_batches_completed(void) { return 0; }
-static inline unsigned long rcu_batches_completed_bh(void) { return 0; }
-static inline unsigned long rcu_batches_completed_sched(void) { return 0; }
+static inline unsigned long rcu_get_gp_seq(void) { return 0; }
+static inline unsigned long rcu_bh_get_gp_seq(void) { return 0; }
+static inline unsigned long rcu_sched_get_gp_seq(void) { return 0; }
static inline unsigned long rcu_exp_batches_completed(void) { return 0; }
static inline unsigned long rcu_exp_batches_completed_sched(void) { return 0; }
static inline unsigned long
@@ -474,19 +525,16 @@
static inline void rcu_bh_force_quiescent_state(void) { }
static inline void rcu_sched_force_quiescent_state(void) { }
static inline void show_rcu_gp_kthreads(void) { }
+static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
#else /* #ifdef CONFIG_TINY_RCU */
-extern unsigned long rcutorture_testseq;
-extern unsigned long rcutorture_vernum;
-unsigned long rcu_batches_started(void);
-unsigned long rcu_batches_started_bh(void);
-unsigned long rcu_batches_started_sched(void);
-unsigned long rcu_batches_completed(void);
-unsigned long rcu_batches_completed_bh(void);
-unsigned long rcu_batches_completed_sched(void);
+unsigned long rcu_get_gp_seq(void);
+unsigned long rcu_bh_get_gp_seq(void);
+unsigned long rcu_sched_get_gp_seq(void);
unsigned long rcu_exp_batches_completed(void);
unsigned long rcu_exp_batches_completed_sched(void);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
void show_rcu_gp_kthreads(void);
+int rcu_get_gp_kthreads_prio(void);
void rcu_force_quiescent_state(void);
void rcu_bh_force_quiescent_state(void);
void rcu_sched_force_quiescent_state(void);
diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c
index e232846..3424452 100644
--- a/kernel/rcu/rcuperf.c
+++ b/kernel/rcu/rcuperf.c
@@ -19,6 +19,9 @@
*
* Authors: Paul E. McKenney <paulmck@us.ibm.com>
*/
+
+#define pr_fmt(fmt) fmt
+
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/init.h>
@@ -88,7 +91,7 @@
torture_param(int, nwriters, -1, "Number of RCU updater threads");
torture_param(bool, shutdown, !IS_ENABLED(MODULE),
"Shutdown at end of performance tests.");
-torture_param(bool, verbose, true, "Enable verbose debugging printk()s");
+torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
static char *perf_type = "rcu";
@@ -135,8 +138,8 @@
void (*cleanup)(void);
int (*readlock)(void);
void (*readunlock)(int idx);
- unsigned long (*started)(void);
- unsigned long (*completed)(void);
+ unsigned long (*get_gp_seq)(void);
+ unsigned long (*gp_diff)(unsigned long new, unsigned long old);
unsigned long (*exp_completed)(void);
void (*async)(struct rcu_head *head, rcu_callback_t func);
void (*gp_barrier)(void);
@@ -176,8 +179,8 @@
.init = rcu_sync_perf_init,
.readlock = rcu_perf_read_lock,
.readunlock = rcu_perf_read_unlock,
- .started = rcu_batches_started,
- .completed = rcu_batches_completed,
+ .get_gp_seq = rcu_get_gp_seq,
+ .gp_diff = rcu_seq_diff,
.exp_completed = rcu_exp_batches_completed,
.async = call_rcu,
.gp_barrier = rcu_barrier,
@@ -206,8 +209,8 @@
.init = rcu_sync_perf_init,
.readlock = rcu_bh_perf_read_lock,
.readunlock = rcu_bh_perf_read_unlock,
- .started = rcu_batches_started_bh,
- .completed = rcu_batches_completed_bh,
+ .get_gp_seq = rcu_bh_get_gp_seq,
+ .gp_diff = rcu_seq_diff,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_bh,
.gp_barrier = rcu_barrier_bh,
@@ -263,8 +266,8 @@
.init = rcu_sync_perf_init,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
- .started = NULL,
- .completed = srcu_perf_completed,
+ .get_gp_seq = srcu_perf_completed,
+ .gp_diff = rcu_seq_diff,
.exp_completed = srcu_perf_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
@@ -292,8 +295,8 @@
.cleanup = srcu_sync_perf_cleanup,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
- .started = NULL,
- .completed = srcu_perf_completed,
+ .get_gp_seq = srcu_perf_completed,
+ .gp_diff = rcu_seq_diff,
.exp_completed = srcu_perf_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
@@ -322,8 +325,8 @@
.init = rcu_sync_perf_init,
.readlock = sched_perf_read_lock,
.readunlock = sched_perf_read_unlock,
- .started = rcu_batches_started_sched,
- .completed = rcu_batches_completed_sched,
+ .get_gp_seq = rcu_sched_get_gp_seq,
+ .gp_diff = rcu_seq_diff,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_sched,
.gp_barrier = rcu_barrier_sched,
@@ -350,8 +353,8 @@
.init = rcu_sync_perf_init,
.readlock = tasks_perf_read_lock,
.readunlock = tasks_perf_read_unlock,
- .started = rcu_no_completed,
- .completed = rcu_no_completed,
+ .get_gp_seq = rcu_no_completed,
+ .gp_diff = rcu_seq_diff,
.async = call_rcu_tasks,
.gp_barrier = rcu_barrier_tasks,
.sync = synchronize_rcu_tasks,
@@ -359,9 +362,11 @@
.name = "tasks"
};
-static bool __maybe_unused torturing_tasks(void)
+static unsigned long rcuperf_seq_diff(unsigned long new, unsigned long old)
{
- return cur_ops == &tasks_ops;
+ if (!cur_ops->gp_diff)
+ return new - old;
+ return cur_ops->gp_diff(new, old);
}
/*
@@ -444,8 +449,7 @@
b_rcu_perf_writer_started =
cur_ops->exp_completed() / 2;
} else {
- b_rcu_perf_writer_started =
- cur_ops->completed();
+ b_rcu_perf_writer_started = cur_ops->get_gp_seq();
}
}
@@ -502,7 +506,7 @@
cur_ops->exp_completed() / 2;
} else {
b_rcu_perf_writer_finished =
- cur_ops->completed();
+ cur_ops->get_gp_seq();
}
if (shutdown) {
smp_mb(); /* Assign before wake. */
@@ -527,7 +531,7 @@
return 0;
}
-static inline void
+static void
rcu_perf_print_module_parms(struct rcu_perf_ops *cur_ops, const char *tag)
{
pr_alert("%s" PERF_FLAG
@@ -582,8 +586,8 @@
t_rcu_perf_writer_finished -
t_rcu_perf_writer_started,
ngps,
- b_rcu_perf_writer_finished -
- b_rcu_perf_writer_started);
+ rcuperf_seq_diff(b_rcu_perf_writer_finished,
+ b_rcu_perf_writer_started));
for (i = 0; i < nrealwriters; i++) {
if (!writer_durations)
break;
@@ -671,12 +675,11 @@
break;
}
if (i == ARRAY_SIZE(perf_ops)) {
- pr_alert("rcu-perf: invalid perf type: \"%s\"\n",
- perf_type);
+ pr_alert("rcu-perf: invalid perf type: \"%s\"\n", perf_type);
pr_alert("rcu-perf types:");
for (i = 0; i < ARRAY_SIZE(perf_ops); i++)
- pr_alert(" %s", perf_ops[i]->name);
- pr_alert("\n");
+ pr_cont(" %s", perf_ops[i]->name);
+ pr_cont("\n");
firsterr = -EINVAL;
goto unwind;
}
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 42fcb7f..c596c6f 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -22,6 +22,9 @@
*
* See also: Documentation/RCU/torture.txt
*/
+
+#define pr_fmt(fmt) fmt
+
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/init.h>
@@ -52,6 +55,7 @@
#include <linux/torture.h>
#include <linux/vmalloc.h>
#include <linux/sched/debug.h>
+#include <linux/sched/sysctl.h>
#include "rcu.h"
@@ -59,6 +63,19 @@
MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and Josh Triplett <josh@joshtriplett.org>");
+/* Bits for ->extendables field, extendables param, and related definitions. */
+#define RCUTORTURE_RDR_SHIFT 8 /* Put SRCU index in upper bits. */
+#define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1)
+#define RCUTORTURE_RDR_BH 0x1 /* Extend readers by disabling bh. */
+#define RCUTORTURE_RDR_IRQ 0x2 /* ... disabling interrupts. */
+#define RCUTORTURE_RDR_PREEMPT 0x4 /* ... disabling preemption. */
+#define RCUTORTURE_RDR_RCU 0x8 /* ... entering another RCU reader. */
+#define RCUTORTURE_RDR_NBITS 4 /* Number of bits defined above. */
+#define RCUTORTURE_MAX_EXTEND (RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ | \
+ RCUTORTURE_RDR_PREEMPT)
+#define RCUTORTURE_RDR_MAX_LOOPS 0x7 /* Maximum reader extensions. */
+ /* Must be power of two minus one. */
+
torture_param(int, cbflood_inter_holdoff, HZ,
"Holdoff between floods (jiffies)");
torture_param(int, cbflood_intra_holdoff, 1,
@@ -66,6 +83,8 @@
torture_param(int, cbflood_n_burst, 3, "# bursts in flood, zero to disable");
torture_param(int, cbflood_n_per_burst, 20000,
"# callbacks per burst in flood");
+torture_param(int, extendables, RCUTORTURE_MAX_EXTEND,
+ "Extend readers by disabling bh (1), irqs (2), or preempt (4)");
torture_param(int, fqs_duration, 0,
"Duration of fqs bursts (us), 0 to disable");
torture_param(int, fqs_holdoff, 0, "Holdoff time within fqs bursts (us)");
@@ -84,7 +103,7 @@
"Enable debug-object double call_rcu() testing");
torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
torture_param(int, onoff_interval, 0,
- "Time between CPU hotplugs (s), 0=disable");
+ "Time between CPU hotplugs (jiffies), 0=disable");
torture_param(int, shuffle_interval, 3, "Number of seconds between shuffles");
torture_param(int, shutdown_secs, 0, "Shutdown time (s), <= zero to disable.");
torture_param(int, stall_cpu, 0, "Stall duration (s), zero to disable.");
@@ -101,7 +120,7 @@
"Interval between boost tests, seconds.");
torture_param(bool, test_no_idle_hz, true,
"Test support for tickless idle CPUs");
-torture_param(bool, verbose, true,
+torture_param(int, verbose, 1,
"Enable verbose debugging printk()s");
static char *torture_type = "rcu";
@@ -148,9 +167,9 @@
static long n_rcu_torture_boost_rterror;
static long n_rcu_torture_boost_failure;
static long n_rcu_torture_boosts;
-static long n_rcu_torture_timers;
+static atomic_long_t n_rcu_torture_timers;
static long n_barrier_attempts;
-static long n_barrier_successes;
+static long n_barrier_successes; /* did rcu_barrier test succeed? */
static atomic_long_t n_cbfloods;
static struct list_head rcu_torture_removed;
@@ -261,8 +280,8 @@
int (*readlock)(void);
void (*read_delay)(struct torture_random_state *rrsp);
void (*readunlock)(int idx);
- unsigned long (*started)(void);
- unsigned long (*completed)(void);
+ unsigned long (*get_gp_seq)(void);
+ unsigned long (*gp_diff)(unsigned long new, unsigned long old);
void (*deferred_free)(struct rcu_torture *p);
void (*sync)(void);
void (*exp_sync)(void);
@@ -274,6 +293,8 @@
void (*stats)(void);
int irq_capable;
int can_boost;
+ int extendables;
+ int ext_irq_conflict;
const char *name;
};
@@ -302,10 +323,10 @@
* force_quiescent_state. */
if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
- started = cur_ops->completed();
+ started = cur_ops->get_gp_seq();
ts = rcu_trace_clock_local();
mdelay(longdelay_ms);
- completed = cur_ops->completed();
+ completed = cur_ops->get_gp_seq();
do_trace_rcu_torture_read(cur_ops->name, NULL, ts,
started, completed);
}
@@ -397,8 +418,8 @@
.readlock = rcu_torture_read_lock,
.read_delay = rcu_read_delay,
.readunlock = rcu_torture_read_unlock,
- .started = rcu_batches_started,
- .completed = rcu_batches_completed,
+ .get_gp_seq = rcu_get_gp_seq,
+ .gp_diff = rcu_seq_diff,
.deferred_free = rcu_torture_deferred_free,
.sync = synchronize_rcu,
.exp_sync = synchronize_rcu_expedited,
@@ -439,8 +460,8 @@
.readlock = rcu_bh_torture_read_lock,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = rcu_bh_torture_read_unlock,
- .started = rcu_batches_started_bh,
- .completed = rcu_batches_completed_bh,
+ .get_gp_seq = rcu_bh_get_gp_seq,
+ .gp_diff = rcu_seq_diff,
.deferred_free = rcu_bh_torture_deferred_free,
.sync = synchronize_rcu_bh,
.exp_sync = synchronize_rcu_bh_expedited,
@@ -449,6 +470,8 @@
.fqs = rcu_bh_force_quiescent_state,
.stats = NULL,
.irq_capable = 1,
+ .extendables = (RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ),
+ .ext_irq_conflict = RCUTORTURE_RDR_RCU,
.name = "rcu_bh"
};
@@ -483,8 +506,7 @@
.readlock = rcu_torture_read_lock,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = rcu_torture_read_unlock,
- .started = rcu_no_completed,
- .completed = rcu_no_completed,
+ .get_gp_seq = rcu_no_completed,
.deferred_free = rcu_busted_torture_deferred_free,
.sync = synchronize_rcu_busted,
.exp_sync = synchronize_rcu_busted,
@@ -572,8 +594,7 @@
.readlock = srcu_torture_read_lock,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
- .started = NULL,
- .completed = srcu_torture_completed,
+ .get_gp_seq = srcu_torture_completed,
.deferred_free = srcu_torture_deferred_free,
.sync = srcu_torture_synchronize,
.exp_sync = srcu_torture_synchronize_expedited,
@@ -610,8 +631,7 @@
.readlock = srcu_torture_read_lock,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
- .started = NULL,
- .completed = srcu_torture_completed,
+ .get_gp_seq = srcu_torture_completed,
.deferred_free = srcu_torture_deferred_free,
.sync = srcu_torture_synchronize,
.exp_sync = srcu_torture_synchronize_expedited,
@@ -622,6 +642,26 @@
.name = "srcud"
};
+/* As above, but broken due to inappropriate reader extension. */
+static struct rcu_torture_ops busted_srcud_ops = {
+ .ttype = SRCU_FLAVOR,
+ .init = srcu_torture_init,
+ .cleanup = srcu_torture_cleanup,
+ .readlock = srcu_torture_read_lock,
+ .read_delay = rcu_read_delay,
+ .readunlock = srcu_torture_read_unlock,
+ .get_gp_seq = srcu_torture_completed,
+ .deferred_free = srcu_torture_deferred_free,
+ .sync = srcu_torture_synchronize,
+ .exp_sync = srcu_torture_synchronize_expedited,
+ .call = srcu_torture_call,
+ .cb_barrier = srcu_torture_barrier,
+ .stats = srcu_torture_stats,
+ .irq_capable = 1,
+ .extendables = RCUTORTURE_MAX_EXTEND,
+ .name = "busted_srcud"
+};
+
/*
* Definitions for sched torture testing.
*/
@@ -648,8 +688,8 @@
.readlock = sched_torture_read_lock,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = sched_torture_read_unlock,
- .started = rcu_batches_started_sched,
- .completed = rcu_batches_completed_sched,
+ .get_gp_seq = rcu_sched_get_gp_seq,
+ .gp_diff = rcu_seq_diff,
.deferred_free = rcu_sched_torture_deferred_free,
.sync = synchronize_sched,
.exp_sync = synchronize_sched_expedited,
@@ -660,6 +700,7 @@
.fqs = rcu_sched_force_quiescent_state,
.stats = NULL,
.irq_capable = 1,
+ .extendables = RCUTORTURE_MAX_EXTEND,
.name = "sched"
};
@@ -687,8 +728,7 @@
.readlock = tasks_torture_read_lock,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = tasks_torture_read_unlock,
- .started = rcu_no_completed,
- .completed = rcu_no_completed,
+ .get_gp_seq = rcu_no_completed,
.deferred_free = rcu_tasks_torture_deferred_free,
.sync = synchronize_rcu_tasks,
.exp_sync = synchronize_rcu_tasks,
@@ -700,6 +740,13 @@
.name = "tasks"
};
+static unsigned long rcutorture_seq_diff(unsigned long new, unsigned long old)
+{
+ if (!cur_ops->gp_diff)
+ return new - old;
+ return cur_ops->gp_diff(new, old);
+}
+
static bool __maybe_unused torturing_tasks(void)
{
return cur_ops == &tasks_ops;
@@ -726,6 +773,44 @@
smp_store_release(&rbip->inflight, 0);
}
+static int old_rt_runtime = -1;
+
+static void rcu_torture_disable_rt_throttle(void)
+{
+ /*
+ * Disable RT throttling so that rcutorture's boost threads don't get
+ * throttled. Only possible if rcutorture is built-in otherwise the
+ * user should manually do this by setting the sched_rt_period_us and
+ * sched_rt_runtime sysctls.
+ */
+ if (!IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) || old_rt_runtime != -1)
+ return;
+
+ old_rt_runtime = sysctl_sched_rt_runtime;
+ sysctl_sched_rt_runtime = -1;
+}
+
+static void rcu_torture_enable_rt_throttle(void)
+{
+ if (!IS_BUILTIN(CONFIG_RCU_TORTURE_TEST) || old_rt_runtime == -1)
+ return;
+
+ sysctl_sched_rt_runtime = old_rt_runtime;
+ old_rt_runtime = -1;
+}
+
+static bool rcu_torture_boost_failed(unsigned long start, unsigned long end)
+{
+ if (end - start > test_boost_duration * HZ - HZ / 2) {
+ VERBOSE_TOROUT_STRING("rcu_torture_boost boosting failed");
+ n_rcu_torture_boost_failure++;
+
+ return true; /* failed */
+ }
+
+ return false; /* passed */
+}
+
static int rcu_torture_boost(void *arg)
{
unsigned long call_rcu_time;
@@ -746,6 +831,21 @@
init_rcu_head_on_stack(&rbi.rcu);
/* Each pass through the following loop does one boost-test cycle. */
do {
+ /* Track if the test failed already in this test interval? */
+ bool failed = false;
+
+ /* Increment n_rcu_torture_boosts once per boost-test */
+ while (!kthread_should_stop()) {
+ if (mutex_trylock(&boost_mutex)) {
+ n_rcu_torture_boosts++;
+ mutex_unlock(&boost_mutex);
+ break;
+ }
+ schedule_timeout_uninterruptible(1);
+ }
+ if (kthread_should_stop())
+ goto checkwait;
+
/* Wait for the next test interval. */
oldstarttime = boost_starttime;
while (ULONG_CMP_LT(jiffies, oldstarttime)) {
@@ -764,11 +864,10 @@
/* RCU core before ->inflight = 1. */
smp_store_release(&rbi.inflight, 1);
call_rcu(&rbi.rcu, rcu_torture_boost_cb);
- if (jiffies - call_rcu_time >
- test_boost_duration * HZ - HZ / 2) {
- VERBOSE_TOROUT_STRING("rcu_torture_boost boosting failed");
- n_rcu_torture_boost_failure++;
- }
+ /* Check if the boost test failed */
+ failed = failed ||
+ rcu_torture_boost_failed(call_rcu_time,
+ jiffies);
call_rcu_time = jiffies;
}
stutter_wait("rcu_torture_boost");
@@ -777,6 +876,14 @@
}
/*
+ * If boost never happened, then inflight will always be 1, in
+ * this case the boost check would never happen in the above
+ * loop so do another one here.
+ */
+ if (!failed && smp_load_acquire(&rbi.inflight))
+ rcu_torture_boost_failed(call_rcu_time, jiffies);
+
+ /*
* Set the start time of the next test interval.
* Yes, this is vulnerable to long delays, but such
* delays simply cause a false negative for the next
@@ -788,7 +895,6 @@
if (mutex_trylock(&boost_mutex)) {
boost_starttime = jiffies +
test_boost_interval * HZ;
- n_rcu_torture_boosts++;
mutex_unlock(&boost_mutex);
break;
}
@@ -1010,7 +1116,7 @@
break;
}
}
- rcutorture_record_progress(++rcu_torture_current_version);
+ rcu_torture_current_version++;
/* Cycle through nesting levels of rcu_expedite_gp() calls. */
if (can_expedite &&
!(torture_random(&rand) & 0xff & (!!expediting - 1))) {
@@ -1084,27 +1190,133 @@
}
/*
- * RCU torture reader from timer handler. Dereferences rcu_torture_current,
- * incrementing the corresponding element of the pipeline array. The
- * counter in the element should never be greater than 1, otherwise, the
- * RCU implementation is broken.
+ * Do one extension of an RCU read-side critical section using the
+ * current reader state in readstate (set to zero for initial entry
+ * to extended critical section), set the new state as specified by
+ * newstate (set to zero for final exit from extended critical section),
+ * and random-number-generator state in trsp. If this is neither the
+ * beginning or end of the critical section and if there was actually a
+ * change, do a ->read_delay().
*/
-static void rcu_torture_timer(struct timer_list *unused)
+static void rcutorture_one_extend(int *readstate, int newstate,
+ struct torture_random_state *trsp)
{
- int idx;
+ int idxnew = -1;
+ int idxold = *readstate;
+ int statesnew = ~*readstate & newstate;
+ int statesold = *readstate & ~newstate;
+
+ WARN_ON_ONCE(idxold < 0);
+ WARN_ON_ONCE((idxold >> RCUTORTURE_RDR_SHIFT) > 1);
+
+ /* First, put new protection in place to avoid critical-section gap. */
+ if (statesnew & RCUTORTURE_RDR_BH)
+ local_bh_disable();
+ if (statesnew & RCUTORTURE_RDR_IRQ)
+ local_irq_disable();
+ if (statesnew & RCUTORTURE_RDR_PREEMPT)
+ preempt_disable();
+ if (statesnew & RCUTORTURE_RDR_RCU)
+ idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
+
+ /* Next, remove old protection, irq first due to bh conflict. */
+ if (statesold & RCUTORTURE_RDR_IRQ)
+ local_irq_enable();
+ if (statesold & RCUTORTURE_RDR_BH)
+ local_bh_enable();
+ if (statesold & RCUTORTURE_RDR_PREEMPT)
+ preempt_enable();
+ if (statesold & RCUTORTURE_RDR_RCU)
+ cur_ops->readunlock(idxold >> RCUTORTURE_RDR_SHIFT);
+
+ /* Delay if neither beginning nor end and there was a change. */
+ if ((statesnew || statesold) && *readstate && newstate)
+ cur_ops->read_delay(trsp);
+
+ /* Update the reader state. */
+ if (idxnew == -1)
+ idxnew = idxold & ~RCUTORTURE_RDR_MASK;
+ WARN_ON_ONCE(idxnew < 0);
+ WARN_ON_ONCE((idxnew >> RCUTORTURE_RDR_SHIFT) > 1);
+ *readstate = idxnew | newstate;
+ WARN_ON_ONCE((*readstate >> RCUTORTURE_RDR_SHIFT) < 0);
+ WARN_ON_ONCE((*readstate >> RCUTORTURE_RDR_SHIFT) > 1);
+}
+
+/* Return the biggest extendables mask given current RCU and boot parameters. */
+static int rcutorture_extend_mask_max(void)
+{
+ int mask;
+
+ WARN_ON_ONCE(extendables & ~RCUTORTURE_MAX_EXTEND);
+ mask = extendables & RCUTORTURE_MAX_EXTEND & cur_ops->extendables;
+ mask = mask | RCUTORTURE_RDR_RCU;
+ return mask;
+}
+
+/* Return a random protection state mask, but with at least one bit set. */
+static int
+rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
+{
+ int mask = rcutorture_extend_mask_max();
+ unsigned long randmask1 = torture_random(trsp) >> 8;
+ unsigned long randmask2 = randmask1 >> 1;
+
+ WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
+ /* Half the time lots of bits, half the time only one bit. */
+ if (randmask1 & 0x1)
+ mask = mask & randmask2;
+ else
+ mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
+ if ((mask & RCUTORTURE_RDR_IRQ) &&
+ !(mask & RCUTORTURE_RDR_BH) &&
+ (oldmask & RCUTORTURE_RDR_BH))
+ mask |= RCUTORTURE_RDR_BH; /* Can't enable bh w/irq disabled. */
+ if ((mask & RCUTORTURE_RDR_IRQ) &&
+ !(mask & cur_ops->ext_irq_conflict) &&
+ (oldmask & cur_ops->ext_irq_conflict))
+ mask |= cur_ops->ext_irq_conflict; /* Or if readers object. */
+ return mask ?: RCUTORTURE_RDR_RCU;
+}
+
+/*
+ * Do a randomly selected number of extensions of an existing RCU read-side
+ * critical section.
+ */
+static void rcutorture_loop_extend(int *readstate,
+ struct torture_random_state *trsp)
+{
+ int i;
+ int mask = rcutorture_extend_mask_max();
+
+ WARN_ON_ONCE(!*readstate); /* -Existing- RCU read-side critsect! */
+ if (!((mask - 1) & mask))
+ return; /* Current RCU flavor not extendable. */
+ i = (torture_random(trsp) >> 3) & RCUTORTURE_RDR_MAX_LOOPS;
+ while (i--) {
+ mask = rcutorture_extend_mask(*readstate, trsp);
+ rcutorture_one_extend(readstate, mask, trsp);
+ }
+}
+
+/*
+ * Do one read-side critical section, returning false if there was
+ * no data to read. Can be invoked both from process context and
+ * from a timer handler.
+ */
+static bool rcu_torture_one_read(struct torture_random_state *trsp)
+{
unsigned long started;
unsigned long completed;
- static DEFINE_TORTURE_RANDOM(rand);
- static DEFINE_SPINLOCK(rand_lock);
+ int newstate;
struct rcu_torture *p;
int pipe_count;
+ int readstate = 0;
unsigned long long ts;
- idx = cur_ops->readlock();
- if (cur_ops->started)
- started = cur_ops->started();
- else
- started = cur_ops->completed();
+ newstate = rcutorture_extend_mask(readstate, trsp);
+ rcutorture_one_extend(&readstate, newstate, trsp);
+ started = cur_ops->get_gp_seq();
ts = rcu_trace_clock_local();
p = rcu_dereference_check(rcu_torture_current,
rcu_read_lock_bh_held() ||
@@ -1112,39 +1324,50 @@
srcu_read_lock_held(srcu_ctlp) ||
torturing_tasks());
if (p == NULL) {
- /* Leave because rcu_torture_writer is not yet underway */
- cur_ops->readunlock(idx);
- return;
+ /* Wait for rcu_torture_writer to get underway */
+ rcutorture_one_extend(&readstate, 0, trsp);
+ return false;
}
if (p->rtort_mbtest == 0)
atomic_inc(&n_rcu_torture_mberror);
- spin_lock(&rand_lock);
- cur_ops->read_delay(&rand);
- n_rcu_torture_timers++;
- spin_unlock(&rand_lock);
+ rcutorture_loop_extend(&readstate, trsp);
preempt_disable();
pipe_count = p->rtort_pipe_count;
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
/* Should not happen, but... */
pipe_count = RCU_TORTURE_PIPE_LEN;
}
- completed = cur_ops->completed();
+ completed = cur_ops->get_gp_seq();
if (pipe_count > 1) {
- do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu, ts,
- started, completed);
+ do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu,
+ ts, started, completed);
rcu_ftrace_dump(DUMP_ALL);
}
__this_cpu_inc(rcu_torture_count[pipe_count]);
- completed = completed - started;
- if (cur_ops->started)
- completed++;
+ completed = rcutorture_seq_diff(completed, started);
if (completed > RCU_TORTURE_PIPE_LEN) {
/* Should not happen, but... */
completed = RCU_TORTURE_PIPE_LEN;
}
__this_cpu_inc(rcu_torture_batch[completed]);
preempt_enable();
- cur_ops->readunlock(idx);
+ rcutorture_one_extend(&readstate, 0, trsp);
+ WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK);
+ return true;
+}
+
+static DEFINE_TORTURE_RANDOM_PERCPU(rcu_torture_timer_rand);
+
+/*
+ * RCU torture reader from timer handler. Dereferences rcu_torture_current,
+ * incrementing the corresponding element of the pipeline array. The
+ * counter in the element should never be greater than 1, otherwise, the
+ * RCU implementation is broken.
+ */
+static void rcu_torture_timer(struct timer_list *unused)
+{
+ atomic_long_inc(&n_rcu_torture_timers);
+ (void)rcu_torture_one_read(this_cpu_ptr(&rcu_torture_timer_rand));
/* Test call_rcu() invocation from interrupt handler. */
if (cur_ops->call) {
@@ -1164,14 +1387,8 @@
static int
rcu_torture_reader(void *arg)
{
- unsigned long started;
- unsigned long completed;
- int idx;
DEFINE_TORTURE_RANDOM(rand);
- struct rcu_torture *p;
- int pipe_count;
struct timer_list t;
- unsigned long long ts;
VERBOSE_TOROUT_STRING("rcu_torture_reader task started");
set_user_nice(current, MAX_NICE);
@@ -1183,49 +1400,8 @@
if (!timer_pending(&t))
mod_timer(&t, jiffies + 1);
}
- idx = cur_ops->readlock();
- if (cur_ops->started)
- started = cur_ops->started();
- else
- started = cur_ops->completed();
- ts = rcu_trace_clock_local();
- p = rcu_dereference_check(rcu_torture_current,
- rcu_read_lock_bh_held() ||
- rcu_read_lock_sched_held() ||
- srcu_read_lock_held(srcu_ctlp) ||
- torturing_tasks());
- if (p == NULL) {
- /* Wait for rcu_torture_writer to get underway */
- cur_ops->readunlock(idx);
+ if (!rcu_torture_one_read(&rand))
schedule_timeout_interruptible(HZ);
- continue;
- }
- if (p->rtort_mbtest == 0)
- atomic_inc(&n_rcu_torture_mberror);
- cur_ops->read_delay(&rand);
- preempt_disable();
- pipe_count = p->rtort_pipe_count;
- if (pipe_count > RCU_TORTURE_PIPE_LEN) {
- /* Should not happen, but... */
- pipe_count = RCU_TORTURE_PIPE_LEN;
- }
- completed = cur_ops->completed();
- if (pipe_count > 1) {
- do_trace_rcu_torture_read(cur_ops->name, &p->rtort_rcu,
- ts, started, completed);
- rcu_ftrace_dump(DUMP_ALL);
- }
- __this_cpu_inc(rcu_torture_count[pipe_count]);
- completed = completed - started;
- if (cur_ops->started)
- completed++;
- if (completed > RCU_TORTURE_PIPE_LEN) {
- /* Should not happen, but... */
- completed = RCU_TORTURE_PIPE_LEN;
- }
- __this_cpu_inc(rcu_torture_batch[completed]);
- preempt_enable();
- cur_ops->readunlock(idx);
stutter_wait("rcu_torture_reader");
} while (!torture_must_stop());
if (irqreader && cur_ops->irq_capable) {
@@ -1282,7 +1458,7 @@
pr_cont("rtbf: %ld rtb: %ld nt: %ld ",
n_rcu_torture_boost_failure,
n_rcu_torture_boosts,
- n_rcu_torture_timers);
+ atomic_long_read(&n_rcu_torture_timers));
torture_onoff_stats();
pr_cont("barrier: %ld/%ld:%ld ",
n_barrier_successes,
@@ -1324,18 +1500,16 @@
if (rtcv_snap == rcu_torture_current_version &&
rcu_torture_current != NULL) {
int __maybe_unused flags = 0;
- unsigned long __maybe_unused gpnum = 0;
- unsigned long __maybe_unused completed = 0;
+ unsigned long __maybe_unused gp_seq = 0;
rcutorture_get_gp_data(cur_ops->ttype,
- &flags, &gpnum, &completed);
+ &flags, &gp_seq);
srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
- &flags, &gpnum, &completed);
+ &flags, &gp_seq);
wtp = READ_ONCE(writer_task);
- pr_alert("??? Writer stall state %s(%d) g%lu c%lu f%#x ->state %#lx cpu %d\n",
+ pr_alert("??? Writer stall state %s(%d) g%lu f%#x ->state %#lx cpu %d\n",
rcu_torture_writer_state_getname(),
- rcu_torture_writer_state,
- gpnum, completed, flags,
+ rcu_torture_writer_state, gp_seq, flags,
wtp == NULL ? ~0UL : wtp->state,
wtp == NULL ? -1 : (int)task_cpu(wtp));
if (!splatted && wtp) {
@@ -1365,7 +1539,7 @@
return 0;
}
-static inline void
+static void
rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
{
pr_alert("%s" TORTURE_FLAG
@@ -1397,6 +1571,7 @@
mutex_lock(&boost_mutex);
t = boost_tasks[cpu];
boost_tasks[cpu] = NULL;
+ rcu_torture_enable_rt_throttle();
mutex_unlock(&boost_mutex);
/* This must be outside of the mutex, otherwise deadlock! */
@@ -1413,6 +1588,7 @@
/* Don't allow time recalculation while creating a new task. */
mutex_lock(&boost_mutex);
+ rcu_torture_disable_rt_throttle();
VERBOSE_TOROUT_STRING("Creating rcu_torture_boost task");
boost_tasks[cpu] = kthread_create_on_node(rcu_torture_boost, NULL,
cpu_to_node(cpu),
@@ -1446,7 +1622,7 @@
VERBOSE_TOROUT_STRING("rcu_torture_stall end holdoff");
}
if (!kthread_should_stop()) {
- stop_at = get_seconds() + stall_cpu;
+ stop_at = ktime_get_seconds() + stall_cpu;
/* RCU CPU stall is expected behavior in following code. */
rcu_read_lock();
if (stall_cpu_irqsoff)
@@ -1455,7 +1631,8 @@
preempt_disable();
pr_alert("rcu_torture_stall start on CPU %d.\n",
smp_processor_id());
- while (ULONG_CMP_LT(get_seconds(), stop_at))
+ while (ULONG_CMP_LT((unsigned long)ktime_get_seconds(),
+ stop_at))
continue; /* Induce RCU CPU stall warning. */
if (stall_cpu_irqsoff)
local_irq_enable();
@@ -1546,8 +1723,9 @@
atomic_read(&barrier_cbs_invoked),
n_barrier_cbs);
WARN_ON_ONCE(1);
+ } else {
+ n_barrier_successes++;
}
- n_barrier_successes++;
schedule_timeout_interruptible(HZ / 10);
} while (!torture_must_stop());
torture_kthread_stopping("rcu_torture_barrier");
@@ -1610,17 +1788,39 @@
}
}
+static bool rcu_torture_can_boost(void)
+{
+ static int boost_warn_once;
+ int prio;
+
+ if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
+ return false;
+
+ prio = rcu_get_gp_kthreads_prio();
+ if (!prio)
+ return false;
+
+ if (prio < 2) {
+ if (boost_warn_once == 1)
+ return false;
+
+ pr_alert("%s: WARN: RCU kthread priority too low to test boosting. Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
+ boost_warn_once = 1;
+ return false;
+ }
+
+ return true;
+}
+
static enum cpuhp_state rcutor_hp;
static void
rcu_torture_cleanup(void)
{
int flags = 0;
- unsigned long gpnum = 0;
- unsigned long completed = 0;
+ unsigned long gp_seq = 0;
int i;
- rcutorture_record_test_transition();
if (torture_cleanup_begin()) {
if (cur_ops->cb_barrier != NULL)
cur_ops->cb_barrier();
@@ -1648,17 +1848,15 @@
fakewriter_tasks = NULL;
}
- rcutorture_get_gp_data(cur_ops->ttype, &flags, &gpnum, &completed);
- srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
- &flags, &gpnum, &completed);
- pr_alert("%s: End-test grace-period state: g%lu c%lu f%#x\n",
- cur_ops->name, gpnum, completed, flags);
+ rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq);
+ srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq);
+ pr_alert("%s: End-test grace-period state: g%lu f%#x\n",
+ cur_ops->name, gp_seq, flags);
torture_stop_kthread(rcu_torture_stats, stats_task);
torture_stop_kthread(rcu_torture_fqs, fqs_task);
for (i = 0; i < ncbflooders; i++)
torture_stop_kthread(rcu_torture_cbflood, cbflood_task[i]);
- if ((test_boost == 1 && cur_ops->can_boost) ||
- test_boost == 2)
+ if (rcu_torture_can_boost())
cpuhp_remove_state(rcutor_hp);
/*
@@ -1746,7 +1944,7 @@
int firsterr = 0;
static struct rcu_torture_ops *torture_ops[] = {
&rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
- &sched_ops, &tasks_ops,
+ &busted_srcud_ops, &sched_ops, &tasks_ops,
};
if (!torture_init_begin(torture_type, verbose))
@@ -1763,8 +1961,8 @@
torture_type);
pr_alert("rcu-torture types:");
for (i = 0; i < ARRAY_SIZE(torture_ops); i++)
- pr_alert(" %s", torture_ops[i]->name);
- pr_alert("\n");
+ pr_cont(" %s", torture_ops[i]->name);
+ pr_cont("\n");
firsterr = -EINVAL;
goto unwind;
}
@@ -1882,8 +2080,7 @@
test_boost_interval = 1;
if (test_boost_duration < 2)
test_boost_duration = 2;
- if ((test_boost == 1 && cur_ops->can_boost) ||
- test_boost == 2) {
+ if (rcu_torture_can_boost()) {
boost_starttime = jiffies + test_boost_interval * HZ;
@@ -1897,7 +2094,7 @@
firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
if (firsterr)
goto unwind;
- firsterr = torture_onoff_init(onoff_holdoff * HZ, onoff_interval * HZ);
+ firsterr = torture_onoff_init(onoff_holdoff * HZ, onoff_interval);
if (firsterr)
goto unwind;
firsterr = rcu_torture_stall_init();
@@ -1926,7 +2123,6 @@
goto unwind;
}
}
- rcutorture_record_test_transition();
torture_init_end();
return 0;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 622792a..04fc2ed 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -110,7 +110,7 @@
WRITE_ONCE(sp->srcu_lock_nesting[idx], newval);
if (!newval && READ_ONCE(sp->srcu_gp_waiting))
- swake_up(&sp->srcu_wq);
+ swake_up_one(&sp->srcu_wq);
}
EXPORT_SYMBOL_GPL(__srcu_read_unlock);
@@ -140,7 +140,7 @@
idx = sp->srcu_idx;
WRITE_ONCE(sp->srcu_idx, !sp->srcu_idx);
WRITE_ONCE(sp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
- swait_event(sp->srcu_wq, !READ_ONCE(sp->srcu_lock_nesting[idx]));
+ swait_event_exclusive(sp->srcu_wq, !READ_ONCE(sp->srcu_lock_nesting[idx]));
WRITE_ONCE(sp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
/* Invoke the callbacks we removed above. */
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index b4123d7..6c9866a 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -26,6 +26,8 @@
*
*/
+#define pr_fmt(fmt) "rcu: " fmt
+
#include <linux/export.h>
#include <linux/mutex.h>
#include <linux/percpu.h>
@@ -390,7 +392,8 @@
}
if (WARN_ON(rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)) != SRCU_STATE_IDLE) ||
WARN_ON(srcu_readers_active(sp))) {
- pr_info("%s: Active srcu_struct %p state: %d\n", __func__, sp, rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)));
+ pr_info("%s: Active srcu_struct %p state: %d\n",
+ __func__, sp, rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)));
return; /* Caller forgot to stop doing call_srcu()? */
}
free_percpu(sp->sda);
@@ -641,6 +644,9 @@
* period s. Losers must either ensure that their desired grace-period
* number is recorded on at least their leaf srcu_node structure, or they
* must take steps to invoke their own callbacks.
+ *
+ * Note that this function also does the work of srcu_funnel_exp_start(),
+ * in some cases by directly invoking it.
*/
static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp,
unsigned long s, bool do_norm)
@@ -823,17 +829,17 @@
* more than one CPU, this means that when "func()" is invoked, each CPU
* is guaranteed to have executed a full memory barrier since the end of
* its last corresponding SRCU read-side critical section whose beginning
- * preceded the call to call_rcu(). It also means that each CPU executing
+ * preceded the call to call_srcu(). It also means that each CPU executing
* an SRCU read-side critical section that continues beyond the start of
- * "func()" must have executed a memory barrier after the call_rcu()
+ * "func()" must have executed a memory barrier after the call_srcu()
* but before the beginning of that SRCU read-side critical section.
* Note that these guarantees include CPUs that are offline, idle, or
* executing in user mode, as well as CPUs that are executing in the kernel.
*
- * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
+ * Furthermore, if CPU A invoked call_srcu() and CPU B invoked the
* resulting SRCU callback function "func()", then both CPU A and CPU
* B are guaranteed to execute a full memory barrier during the time
- * interval between the call to call_rcu() and the invocation of "func()".
+ * interval between the call to call_srcu() and the invocation of "func()".
* This guarantee applies even if CPU A and CPU B are the same CPU (but
* again only if the system has more than one CPU).
*
@@ -1246,13 +1252,12 @@
void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
- unsigned long *gpnum, unsigned long *completed)
+ unsigned long *gp_seq)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
- *completed = rcu_seq_ctr(sp->srcu_gp_seq);
- *gpnum = rcu_seq_ctr(sp->srcu_gp_seq_needed);
+ *gp_seq = rcu_seq_current(&sp->srcu_gp_seq);
}
EXPORT_SYMBOL_GPL(srcutorture_get_gp_data);
@@ -1263,16 +1268,17 @@
unsigned long s0 = 0, s1 = 0;
idx = sp->srcu_idx & 0x1;
- pr_alert("%s%s Tree SRCU per-CPU(idx=%d):", tt, tf, idx);
+ pr_alert("%s%s Tree SRCU g%ld per-CPU(idx=%d):",
+ tt, tf, rcu_seq_current(&sp->srcu_gp_seq), idx);
for_each_possible_cpu(cpu) {
unsigned long l0, l1;
unsigned long u0, u1;
long c0, c1;
- struct srcu_data *counts;
+ struct srcu_data *sdp;
- counts = per_cpu_ptr(sp->sda, cpu);
- u0 = counts->srcu_unlock_count[!idx];
- u1 = counts->srcu_unlock_count[idx];
+ sdp = per_cpu_ptr(sp->sda, cpu);
+ u0 = sdp->srcu_unlock_count[!idx];
+ u1 = sdp->srcu_unlock_count[idx];
/*
* Make sure that a lock is always counted if the corresponding
@@ -1280,12 +1286,13 @@
*/
smp_rmb();
- l0 = counts->srcu_lock_count[!idx];
- l1 = counts->srcu_lock_count[idx];
+ l0 = sdp->srcu_lock_count[!idx];
+ l1 = sdp->srcu_lock_count[idx];
c0 = l0 - u0;
c1 = l1 - u1;
- pr_cont(" %d(%ld,%ld)", cpu, c0, c1);
+ pr_cont(" %d(%ld,%ld %1p)",
+ cpu, c0, c1, rcu_segcblist_head(&sdp->srcu_cblist));
s0 += c0;
s1 += c1;
}
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index a64eee0..befc932 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -122,10 +122,8 @@
{
if (user)
rcu_sched_qs();
- else if (!in_softirq())
+ if (user || !in_softirq())
rcu_bh_qs();
- if (user)
- rcu_note_voluntary_context_switch(current);
}
/*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index aa7cade..0b760c1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -27,6 +27,9 @@
* For detailed explanation of Read-Copy Update mechanism see -
* Documentation/RCU
*/
+
+#define pr_fmt(fmt) "rcu: " fmt
+
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/init.h>
@@ -95,13 +98,13 @@
.rda = &sname##_data, \
.call = cr, \
.gp_state = RCU_GP_IDLE, \
- .gpnum = 0UL - 300UL, \
- .completed = 0UL - 300UL, \
+ .gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT, \
.barrier_mutex = __MUTEX_INITIALIZER(sname##_state.barrier_mutex), \
.name = RCU_STATE_NAME(sname), \
.abbr = sabbr, \
.exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \
.exp_wake_mutex = __MUTEX_INITIALIZER(sname##_state.exp_wake_mutex), \
+ .ofl_lock = __SPIN_LOCK_UNLOCKED(sname##_state.ofl_lock), \
}
RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
@@ -155,6 +158,9 @@
*/
static int rcu_scheduler_fully_active __read_mostly;
+static void
+rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
+ struct rcu_node *rnp, unsigned long gps, unsigned long flags);
static void rcu_init_new_rnp(struct rcu_node *rnp_leaf);
static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
@@ -177,6 +183,13 @@
static int gp_cleanup_delay;
module_param(gp_cleanup_delay, int, 0444);
+/* Retreive RCU kthreads priority for rcutorture */
+int rcu_get_gp_kthreads_prio(void)
+{
+ return kthread_prio;
+}
+EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
+
/*
* Number of grace periods between delays, normalized by the duration of
* the delay. The longer the delay, the more the grace periods between
@@ -189,18 +202,6 @@
#define PER_RCU_NODE_PERIOD 3 /* Number of grace periods between delays. */
/*
- * Track the rcutorture test sequence number and the update version
- * number within a given test. The rcutorture_testseq is incremented
- * on every rcutorture module load and unload, so has an odd value
- * when a test is running. The rcutorture_vernum is set to zero
- * when rcutorture starts and is incremented on each rcutorture update.
- * These variables enable correlating rcutorture output with the
- * RCU tracing information.
- */
-unsigned long rcutorture_testseq;
-unsigned long rcutorture_vernum;
-
-/*
* Compute the mask of online CPUs for the specified rcu_node structure.
* This will not be stable unless the rcu_node structure's ->lock is
* held, but the bit corresponding to the current CPU will be stable
@@ -218,7 +219,7 @@
*/
static int rcu_gp_in_progress(struct rcu_state *rsp)
{
- return READ_ONCE(rsp->completed) != READ_ONCE(rsp->gpnum);
+ return rcu_seq_state(rcu_seq_current(&rsp->gp_seq));
}
/*
@@ -233,7 +234,7 @@
if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.s))
return;
trace_rcu_grace_period(TPS("rcu_sched"),
- __this_cpu_read(rcu_sched_data.gpnum),
+ __this_cpu_read(rcu_sched_data.gp_seq),
TPS("cpuqs"));
__this_cpu_write(rcu_sched_data.cpu_no_qs.b.norm, false);
if (!__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
@@ -248,7 +249,7 @@
RCU_LOCKDEP_WARN(preemptible(), "rcu_bh_qs() invoked with preemption enabled!!!");
if (__this_cpu_read(rcu_bh_data.cpu_no_qs.s)) {
trace_rcu_grace_period(TPS("rcu_bh"),
- __this_cpu_read(rcu_bh_data.gpnum),
+ __this_cpu_read(rcu_bh_data.gp_seq),
TPS("cpuqs"));
__this_cpu_write(rcu_bh_data.cpu_no_qs.b.norm, false);
}
@@ -380,20 +381,6 @@
}
/*
- * Do a double-increment of the ->dynticks counter to emulate a
- * momentary idle-CPU quiescent state.
- */
-static void rcu_dynticks_momentary_idle(void)
-{
- struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
- int special = atomic_add_return(2 * RCU_DYNTICK_CTRL_CTR,
- &rdtp->dynticks);
-
- /* It is illegal to call this from idle state. */
- WARN_ON_ONCE(!(special & RCU_DYNTICK_CTRL_CTR));
-}
-
-/*
* Set the special (bottom) bit of the specified CPU so that it
* will take special action (such as flushing its TLB) on the
* next exit from an extended quiescent state. Returns true if
@@ -424,12 +411,17 @@
*
* We inform the RCU core by emulating a zero-duration dyntick-idle period.
*
- * The caller must have disabled interrupts.
+ * The caller must have disabled interrupts and must not be idle.
*/
static void rcu_momentary_dyntick_idle(void)
{
+ struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
+ int special;
+
raw_cpu_write(rcu_dynticks.rcu_need_heavy_qs, false);
- rcu_dynticks_momentary_idle();
+ special = atomic_add_return(2 * RCU_DYNTICK_CTRL_CTR, &rdtp->dynticks);
+ /* It is illegal to call this from idle state. */
+ WARN_ON_ONCE(!(special & RCU_DYNTICK_CTRL_CTR));
}
/*
@@ -451,7 +443,7 @@
rcu_momentary_dyntick_idle();
this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
if (!preempt)
- rcu_note_voluntary_context_switch_lite(current);
+ rcu_tasks_qs(current);
out:
trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
@@ -513,8 +505,38 @@
static ulong jiffies_till_next_fqs = ULONG_MAX;
static bool rcu_kick_kthreads;
-module_param(jiffies_till_first_fqs, ulong, 0644);
-module_param(jiffies_till_next_fqs, ulong, 0644);
+static int param_set_first_fqs_jiffies(const char *val, const struct kernel_param *kp)
+{
+ ulong j;
+ int ret = kstrtoul(val, 0, &j);
+
+ if (!ret)
+ WRITE_ONCE(*(ulong *)kp->arg, (j > HZ) ? HZ : j);
+ return ret;
+}
+
+static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param *kp)
+{
+ ulong j;
+ int ret = kstrtoul(val, 0, &j);
+
+ if (!ret)
+ WRITE_ONCE(*(ulong *)kp->arg, (j > HZ) ? HZ : (j ?: 1));
+ return ret;
+}
+
+static struct kernel_param_ops first_fqs_jiffies_ops = {
+ .set = param_set_first_fqs_jiffies,
+ .get = param_get_ulong,
+};
+
+static struct kernel_param_ops next_fqs_jiffies_ops = {
+ .set = param_set_next_fqs_jiffies,
+ .get = param_get_ulong,
+};
+
+module_param_cb(jiffies_till_first_fqs, &first_fqs_jiffies_ops, &jiffies_till_first_fqs, 0644);
+module_param_cb(jiffies_till_next_fqs, &next_fqs_jiffies_ops, &jiffies_till_next_fqs, 0644);
module_param(rcu_kick_kthreads, bool, 0644);
/*
@@ -529,58 +551,31 @@
static int rcu_pending(void);
/*
- * Return the number of RCU batches started thus far for debug & stats.
+ * Return the number of RCU GPs completed thus far for debug & stats.
*/
-unsigned long rcu_batches_started(void)
+unsigned long rcu_get_gp_seq(void)
{
- return rcu_state_p->gpnum;
+ return READ_ONCE(rcu_state_p->gp_seq);
}
-EXPORT_SYMBOL_GPL(rcu_batches_started);
+EXPORT_SYMBOL_GPL(rcu_get_gp_seq);
/*
- * Return the number of RCU-sched batches started thus far for debug & stats.
+ * Return the number of RCU-sched GPs completed thus far for debug & stats.
*/
-unsigned long rcu_batches_started_sched(void)
+unsigned long rcu_sched_get_gp_seq(void)
{
- return rcu_sched_state.gpnum;
+ return READ_ONCE(rcu_sched_state.gp_seq);
}
-EXPORT_SYMBOL_GPL(rcu_batches_started_sched);
+EXPORT_SYMBOL_GPL(rcu_sched_get_gp_seq);
/*
- * Return the number of RCU BH batches started thus far for debug & stats.
+ * Return the number of RCU-bh GPs completed thus far for debug & stats.
*/
-unsigned long rcu_batches_started_bh(void)
+unsigned long rcu_bh_get_gp_seq(void)
{
- return rcu_bh_state.gpnum;
+ return READ_ONCE(rcu_bh_state.gp_seq);
}
-EXPORT_SYMBOL_GPL(rcu_batches_started_bh);
-
-/*
- * Return the number of RCU batches completed thus far for debug & stats.
- */
-unsigned long rcu_batches_completed(void)
-{
- return rcu_state_p->completed;
-}
-EXPORT_SYMBOL_GPL(rcu_batches_completed);
-
-/*
- * Return the number of RCU-sched batches completed thus far for debug & stats.
- */
-unsigned long rcu_batches_completed_sched(void)
-{
- return rcu_sched_state.completed;
-}
-EXPORT_SYMBOL_GPL(rcu_batches_completed_sched);
-
-/*
- * Return the number of RCU BH batches completed thus far for debug & stats.
- */
-unsigned long rcu_batches_completed_bh(void)
-{
- return rcu_bh_state.completed;
-}
-EXPORT_SYMBOL_GPL(rcu_batches_completed_bh);
+EXPORT_SYMBOL_GPL(rcu_bh_get_gp_seq);
/*
* Return the number of RCU expedited batches completed thus far for
@@ -636,35 +631,42 @@
*/
void show_rcu_gp_kthreads(void)
{
+ int cpu;
+ struct rcu_data *rdp;
+ struct rcu_node *rnp;
struct rcu_state *rsp;
for_each_rcu_flavor(rsp) {
pr_info("%s: wait state: %d ->state: %#lx\n",
rsp->name, rsp->gp_state, rsp->gp_kthread->state);
+ rcu_for_each_node_breadth_first(rsp, rnp) {
+ if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
+ continue;
+ pr_info("\trcu_node %d:%d ->gp_seq %lu ->gp_seq_needed %lu\n",
+ rnp->grplo, rnp->grphi, rnp->gp_seq,
+ rnp->gp_seq_needed);
+ if (!rcu_is_leaf_node(rnp))
+ continue;
+ for_each_leaf_node_possible_cpu(rnp, cpu) {
+ rdp = per_cpu_ptr(rsp->rda, cpu);
+ if (rdp->gpwrap ||
+ ULONG_CMP_GE(rsp->gp_seq,
+ rdp->gp_seq_needed))
+ continue;
+ pr_info("\tcpu %d ->gp_seq_needed %lu\n",
+ cpu, rdp->gp_seq_needed);
+ }
+ }
/* sched_show_task(rsp->gp_kthread); */
}
}
EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
/*
- * Record the number of times rcutorture tests have been initiated and
- * terminated. This information allows the debugfs tracing stats to be
- * correlated to the rcutorture messages, even when the rcutorture module
- * is being repeatedly loaded and unloaded. In other words, we cannot
- * store this state in rcutorture itself.
- */
-void rcutorture_record_test_transition(void)
-{
- rcutorture_testseq++;
- rcutorture_vernum = 0;
-}
-EXPORT_SYMBOL_GPL(rcutorture_record_test_transition);
-
-/*
* Send along grace-period-related data for rcutorture diagnostics.
*/
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
- unsigned long *gpnum, unsigned long *completed)
+ unsigned long *gp_seq)
{
struct rcu_state *rsp = NULL;
@@ -684,23 +686,11 @@
if (rsp == NULL)
return;
*flags = READ_ONCE(rsp->gp_flags);
- *gpnum = READ_ONCE(rsp->gpnum);
- *completed = READ_ONCE(rsp->completed);
+ *gp_seq = rcu_seq_current(&rsp->gp_seq);
}
EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);
/*
- * Record the number of writer passes through the current rcutorture test.
- * This is also used to correlate debugfs tracing stats with the rcutorture
- * messages.
- */
-void rcutorture_record_progress(unsigned long vernum)
-{
- rcutorture_vernum++;
-}
-EXPORT_SYMBOL_GPL(rcutorture_record_progress);
-
-/*
* Return the root node of the specified rcu_state structure.
*/
static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
@@ -1059,41 +1049,41 @@
#if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
/*
- * Is the current CPU online? Disable preemption to avoid false positives
- * that could otherwise happen due to the current CPU number being sampled,
- * this task being preempted, its old CPU being taken offline, resuming
- * on some other CPU, then determining that its old CPU is now offline.
- * It is OK to use RCU on an offline processor during initial boot, hence
- * the check for rcu_scheduler_fully_active. Note also that it is OK
- * for a CPU coming online to use RCU for one jiffy prior to marking itself
- * online in the cpu_online_mask. Similarly, it is OK for a CPU going
- * offline to continue to use RCU for one jiffy after marking itself
- * offline in the cpu_online_mask. This leniency is necessary given the
- * non-atomic nature of the online and offline processing, for example,
- * the fact that a CPU enters the scheduler after completing the teardown
- * of the CPU.
+ * Is the current CPU online as far as RCU is concerned?
*
- * This is also why RCU internally marks CPUs online during in the
- * preparation phase and offline after the CPU has been taken down.
+ * Disable preemption to avoid false positives that could otherwise
+ * happen due to the current CPU number being sampled, this task being
+ * preempted, its old CPU being taken offline, resuming on some other CPU,
+ * then determining that its old CPU is now offline. Because there are
+ * multiple flavors of RCU, and because this function can be called in the
+ * midst of updating the flavors while a given CPU coming online or going
+ * offline, it is necessary to check all flavors. If any of the flavors
+ * believe that given CPU is online, it is considered to be online.
*
- * Disable checking if in an NMI handler because we cannot safely report
- * errors from NMI handlers anyway.
+ * Disable checking if in an NMI handler because we cannot safely
+ * report errors from NMI handlers anyway. In addition, it is OK to use
+ * RCU on an offline processor during initial boot, hence the check for
+ * rcu_scheduler_fully_active.
*/
bool rcu_lockdep_current_cpu_online(void)
{
struct rcu_data *rdp;
struct rcu_node *rnp;
- bool ret;
+ struct rcu_state *rsp;
- if (in_nmi())
+ if (in_nmi() || !rcu_scheduler_fully_active)
return true;
preempt_disable();
- rdp = this_cpu_ptr(&rcu_sched_data);
- rnp = rdp->mynode;
- ret = (rdp->grpmask & rcu_rnp_online_cpus(rnp)) ||
- !rcu_scheduler_fully_active;
+ for_each_rcu_flavor(rsp) {
+ rdp = this_cpu_ptr(rsp->rda);
+ rnp = rdp->mynode;
+ if (rdp->grpmask & rcu_rnp_online_cpus(rnp)) {
+ preempt_enable();
+ return true;
+ }
+ }
preempt_enable();
- return ret;
+ return false;
}
EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online);
@@ -1115,17 +1105,18 @@
/*
* We are reporting a quiescent state on behalf of some other CPU, so
* it is our responsibility to check for and handle potential overflow
- * of the rcu_node ->gpnum counter with respect to the rcu_data counters.
+ * of the rcu_node ->gp_seq counter with respect to the rcu_data counters.
* After all, the CPU might be in deep idle state, and thus executing no
* code whatsoever.
*/
static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp)
{
raw_lockdep_assert_held_rcu_node(rnp);
- if (ULONG_CMP_LT(READ_ONCE(rdp->gpnum) + ULONG_MAX / 4, rnp->gpnum))
+ if (ULONG_CMP_LT(rcu_seq_current(&rdp->gp_seq) + ULONG_MAX / 4,
+ rnp->gp_seq))
WRITE_ONCE(rdp->gpwrap, true);
- if (ULONG_CMP_LT(rdp->rcu_iw_gpnum + ULONG_MAX / 4, rnp->gpnum))
- rdp->rcu_iw_gpnum = rnp->gpnum + ULONG_MAX / 4;
+ if (ULONG_CMP_LT(rdp->rcu_iw_gp_seq + ULONG_MAX / 4, rnp->gp_seq))
+ rdp->rcu_iw_gp_seq = rnp->gp_seq + ULONG_MAX / 4;
}
/*
@@ -1137,7 +1128,7 @@
{
rdp->dynticks_snap = rcu_dynticks_snap(rdp->dynticks);
if (rcu_dynticks_in_eqs(rdp->dynticks_snap)) {
- trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("dti"));
+ trace_rcu_fqs(rdp->rsp->name, rdp->gp_seq, rdp->cpu, TPS("dti"));
rcu_gpnum_ovf(rdp->mynode, rdp);
return 1;
}
@@ -1159,7 +1150,7 @@
rnp = rdp->mynode;
raw_spin_lock_rcu_node(rnp);
if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
- rdp->rcu_iw_gpnum = rnp->gpnum;
+ rdp->rcu_iw_gp_seq = rnp->gp_seq;
rdp->rcu_iw_pending = false;
}
raw_spin_unlock_rcu_node(rnp);
@@ -1187,7 +1178,7 @@
* of the current RCU grace period.
*/
if (rcu_dynticks_in_eqs_since(rdp->dynticks, rdp->dynticks_snap)) {
- trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("dti"));
+ trace_rcu_fqs(rdp->rsp->name, rdp->gp_seq, rdp->cpu, TPS("dti"));
rdp->dynticks_fqs++;
rcu_gpnum_ovf(rnp, rdp);
return 1;
@@ -1203,8 +1194,8 @@
ruqp = per_cpu_ptr(&rcu_dynticks.rcu_urgent_qs, rdp->cpu);
if (time_after(jiffies, rdp->rsp->gp_start + jtsq) &&
READ_ONCE(rdp->rcu_qs_ctr_snap) != per_cpu(rcu_dynticks.rcu_qs_ctr, rdp->cpu) &&
- READ_ONCE(rdp->gpnum) == rnp->gpnum && !rdp->gpwrap) {
- trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("rqc"));
+ rcu_seq_current(&rdp->gp_seq) == rnp->gp_seq && !rdp->gpwrap) {
+ trace_rcu_fqs(rdp->rsp->name, rdp->gp_seq, rdp->cpu, TPS("rqc"));
rcu_gpnum_ovf(rnp, rdp);
return 1;
} else if (time_after(jiffies, rdp->rsp->gp_start + jtsq)) {
@@ -1212,12 +1203,25 @@
smp_store_release(ruqp, true);
}
- /* Check for the CPU being offline. */
- if (!(rdp->grpmask & rcu_rnp_online_cpus(rnp))) {
- trace_rcu_fqs(rdp->rsp->name, rdp->gpnum, rdp->cpu, TPS("ofl"));
- rdp->offline_fqs++;
- rcu_gpnum_ovf(rnp, rdp);
- return 1;
+ /* If waiting too long on an offline CPU, complain. */
+ if (!(rdp->grpmask & rcu_rnp_online_cpus(rnp)) &&
+ time_after(jiffies, rdp->rsp->gp_start + HZ)) {
+ bool onl;
+ struct rcu_node *rnp1;
+
+ WARN_ON(1); /* Offline CPUs are supposed to report QS! */
+ pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
+ __func__, rnp->grplo, rnp->grphi, rnp->level,
+ (long)rnp->gp_seq, (long)rnp->completedqs);
+ for (rnp1 = rnp; rnp1; rnp1 = rnp1->parent)
+ pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx ->rcu_gp_init_mask %#lx\n",
+ __func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext, rnp1->rcu_gp_init_mask);
+ onl = !!(rdp->grpmask & rcu_rnp_online_cpus(rnp));
+ pr_info("%s %d: %c online: %ld(%d) offline: %ld(%d)\n",
+ __func__, rdp->cpu, ".o"[onl],
+ (long)rdp->rcu_onl_gp_seq, rdp->rcu_onl_gp_flags,
+ (long)rdp->rcu_ofl_gp_seq, rdp->rcu_ofl_gp_flags);
+ return 1; /* Break things loose after complaining. */
}
/*
@@ -1256,11 +1260,11 @@
if (jiffies - rdp->rsp->gp_start > rcu_jiffies_till_stall_check() / 2) {
resched_cpu(rdp->cpu);
if (IS_ENABLED(CONFIG_IRQ_WORK) &&
- !rdp->rcu_iw_pending && rdp->rcu_iw_gpnum != rnp->gpnum &&
+ !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
(rnp->ffmask & rdp->grpmask)) {
init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
rdp->rcu_iw_pending = true;
- rdp->rcu_iw_gpnum = rnp->gpnum;
+ rdp->rcu_iw_gp_seq = rnp->gp_seq;
irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
}
}
@@ -1274,9 +1278,9 @@
unsigned long j1;
rsp->gp_start = j;
- smp_wmb(); /* Record start time before stall time. */
j1 = rcu_jiffies_till_stall_check();
- WRITE_ONCE(rsp->jiffies_stall, j + j1);
+ /* Record ->gp_start before ->jiffies_stall. */
+ smp_store_release(&rsp->jiffies_stall, j + j1); /* ^^^ */
rsp->jiffies_resched = j + j1 / 2;
rsp->n_force_qs_gpstart = READ_ONCE(rsp->n_force_qs);
}
@@ -1302,9 +1306,9 @@
j = jiffies;
gpa = READ_ONCE(rsp->gp_activity);
if (j - gpa > 2 * HZ) {
- pr_err("%s kthread starved for %ld jiffies! g%lu c%lu f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
+ pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
rsp->name, j - gpa,
- rsp->gpnum, rsp->completed,
+ (long)rcu_seq_current(&rsp->gp_seq),
rsp->gp_flags,
gp_state_getname(rsp->gp_state), rsp->gp_state,
rsp->gp_kthread ? rsp->gp_kthread->state : ~0,
@@ -1359,16 +1363,15 @@
}
}
-static inline void panic_on_rcu_stall(void)
+static void panic_on_rcu_stall(void)
{
if (sysctl_panic_on_rcu_stall)
panic("RCU Stall\n");
}
-static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
+static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gp_seq)
{
int cpu;
- long delta;
unsigned long flags;
unsigned long gpa;
unsigned long j;
@@ -1381,25 +1384,12 @@
if (rcu_cpu_stall_suppress)
return;
- /* Only let one CPU complain about others per time interval. */
-
- raw_spin_lock_irqsave_rcu_node(rnp, flags);
- delta = jiffies - READ_ONCE(rsp->jiffies_stall);
- if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp)) {
- raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
- return;
- }
- WRITE_ONCE(rsp->jiffies_stall,
- jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
- raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-
/*
* OK, time to rat on our buddy...
* See Documentation/RCU/stallwarn.txt for info on how to debug
* RCU CPU stall warnings.
*/
- pr_err("INFO: %s detected stalls on CPUs/tasks:",
- rsp->name);
+ pr_err("INFO: %s detected stalls on CPUs/tasks:", rsp->name);
print_cpu_stall_info_begin();
rcu_for_each_leaf_node(rsp, rnp) {
raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -1418,17 +1408,16 @@
for_each_possible_cpu(cpu)
totqlen += rcu_segcblist_n_cbs(&per_cpu_ptr(rsp->rda,
cpu)->cblist);
- pr_cont("(detected by %d, t=%ld jiffies, g=%ld, c=%ld, q=%lu)\n",
+ pr_cont("(detected by %d, t=%ld jiffies, g=%ld, q=%lu)\n",
smp_processor_id(), (long)(jiffies - rsp->gp_start),
- (long)rsp->gpnum, (long)rsp->completed, totqlen);
+ (long)rcu_seq_current(&rsp->gp_seq), totqlen);
if (ndetected) {
rcu_dump_cpu_stacks(rsp);
/* Complain about tasks blocking the grace period. */
rcu_print_detail_task_stall(rsp);
} else {
- if (READ_ONCE(rsp->gpnum) != gpnum ||
- READ_ONCE(rsp->completed) == gpnum) {
+ if (rcu_seq_current(&rsp->gp_seq) != gp_seq) {
pr_err("INFO: Stall ended before state dump start\n");
} else {
j = jiffies;
@@ -1441,6 +1430,10 @@
sched_show_task(current);
}
}
+ /* Rewrite if needed in case of slow consoles. */
+ if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
+ WRITE_ONCE(rsp->jiffies_stall,
+ jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
rcu_check_gp_kthread_starvation(rsp);
@@ -1476,15 +1469,16 @@
for_each_possible_cpu(cpu)
totqlen += rcu_segcblist_n_cbs(&per_cpu_ptr(rsp->rda,
cpu)->cblist);
- pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
+ pr_cont(" (t=%lu jiffies g=%ld q=%lu)\n",
jiffies - rsp->gp_start,
- (long)rsp->gpnum, (long)rsp->completed, totqlen);
+ (long)rcu_seq_current(&rsp->gp_seq), totqlen);
rcu_check_gp_kthread_starvation(rsp);
rcu_dump_cpu_stacks(rsp);
raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ /* Rewrite if needed in case of slow consoles. */
if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
WRITE_ONCE(rsp->jiffies_stall,
jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
@@ -1504,10 +1498,11 @@
static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
{
- unsigned long completed;
- unsigned long gpnum;
+ unsigned long gs1;
+ unsigned long gs2;
unsigned long gps;
unsigned long j;
+ unsigned long jn;
unsigned long js;
struct rcu_node *rnp;
@@ -1520,43 +1515,46 @@
/*
* Lots of memory barriers to reject false positives.
*
- * The idea is to pick up rsp->gpnum, then rsp->jiffies_stall,
- * then rsp->gp_start, and finally rsp->completed. These values
- * are updated in the opposite order with memory barriers (or
- * equivalent) during grace-period initialization and cleanup.
- * Now, a false positive can occur if we get an new value of
- * rsp->gp_start and a old value of rsp->jiffies_stall. But given
- * the memory barriers, the only way that this can happen is if one
- * grace period ends and another starts between these two fetches.
- * Detect this by comparing rsp->completed with the previous fetch
- * from rsp->gpnum.
+ * The idea is to pick up rsp->gp_seq, then rsp->jiffies_stall,
+ * then rsp->gp_start, and finally another copy of rsp->gp_seq.
+ * These values are updated in the opposite order with memory
+ * barriers (or equivalent) during grace-period initialization
+ * and cleanup. Now, a false positive can occur if we get an new
+ * value of rsp->gp_start and a old value of rsp->jiffies_stall.
+ * But given the memory barriers, the only way that this can happen
+ * is if one grace period ends and another starts between these
+ * two fetches. This is detected by comparing the second fetch
+ * of rsp->gp_seq with the previous fetch from rsp->gp_seq.
*
* Given this check, comparisons of jiffies, rsp->jiffies_stall,
* and rsp->gp_start suffice to forestall false positives.
*/
- gpnum = READ_ONCE(rsp->gpnum);
- smp_rmb(); /* Pick up ->gpnum first... */
+ gs1 = READ_ONCE(rsp->gp_seq);
+ smp_rmb(); /* Pick up ->gp_seq first... */
js = READ_ONCE(rsp->jiffies_stall);
smp_rmb(); /* ...then ->jiffies_stall before the rest... */
gps = READ_ONCE(rsp->gp_start);
- smp_rmb(); /* ...and finally ->gp_start before ->completed. */
- completed = READ_ONCE(rsp->completed);
- if (ULONG_CMP_GE(completed, gpnum) ||
+ smp_rmb(); /* ...and finally ->gp_start before ->gp_seq again. */
+ gs2 = READ_ONCE(rsp->gp_seq);
+ if (gs1 != gs2 ||
ULONG_CMP_LT(j, js) ||
ULONG_CMP_GE(gps, js))
return; /* No stall or GP completed since entering function. */
rnp = rdp->mynode;
+ jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
if (rcu_gp_in_progress(rsp) &&
- (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
+ (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
+ cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
/* We haven't checked in, so go dump stack. */
print_cpu_stall(rsp);
} else if (rcu_gp_in_progress(rsp) &&
- ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
+ ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
+ cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
/* They had a few time units to dump stack, so complain. */
- print_other_cpu_stall(rsp, gpnum);
+ print_other_cpu_stall(rsp, gs2);
}
}
@@ -1577,123 +1575,99 @@
WRITE_ONCE(rsp->jiffies_stall, jiffies + ULONG_MAX / 2);
}
-/*
- * Determine the value that ->completed will have at the end of the
- * next subsequent grace period. This is used to tag callbacks so that
- * a CPU can invoke callbacks in a timely fashion even if that CPU has
- * been dyntick-idle for an extended period with callbacks under the
- * influence of RCU_FAST_NO_HZ.
- *
- * The caller must hold rnp->lock with interrupts disabled.
- */
-static unsigned long rcu_cbs_completed(struct rcu_state *rsp,
- struct rcu_node *rnp)
-{
- raw_lockdep_assert_held_rcu_node(rnp);
-
- /*
- * If RCU is idle, we just wait for the next grace period.
- * But we can only be sure that RCU is idle if we are looking
- * at the root rcu_node structure -- otherwise, a new grace
- * period might have started, but just not yet gotten around
- * to initializing the current non-root rcu_node structure.
- */
- if (rcu_get_root(rsp) == rnp && rnp->gpnum == rnp->completed)
- return rnp->completed + 1;
-
- /*
- * If the current rcu_node structure believes that RCU is
- * idle, and if the rcu_state structure does not yet reflect
- * the start of a new grace period, then the next grace period
- * will suffice. The memory barrier is needed to accurately
- * sample the rsp->gpnum, and pairs with the second lock
- * acquisition in rcu_gp_init(), which is augmented with
- * smp_mb__after_unlock_lock() for this purpose.
- */
- if (rnp->gpnum == rnp->completed) {
- smp_mb(); /* See above block comment. */
- if (READ_ONCE(rsp->gpnum) == rnp->completed)
- return rnp->completed + 1;
- }
-
- /*
- * Otherwise, wait for a possible partial grace period and
- * then the subsequent full grace period.
- */
- return rnp->completed + 2;
-}
-
/* Trace-event wrapper function for trace_rcu_future_grace_period. */
static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
- unsigned long c, const char *s)
+ unsigned long gp_seq_req, const char *s)
{
- trace_rcu_future_grace_period(rdp->rsp->name, rnp->gpnum,
- rnp->completed, c, rnp->level,
- rnp->grplo, rnp->grphi, s);
+ trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, gp_seq_req,
+ rnp->level, rnp->grplo, rnp->grphi, s);
}
/*
+ * rcu_start_this_gp - Request the start of a particular grace period
+ * @rnp_start: The leaf node of the CPU from which to start.
+ * @rdp: The rcu_data corresponding to the CPU from which to start.
+ * @gp_seq_req: The gp_seq of the grace period to start.
+ *
* Start the specified grace period, as needed to handle newly arrived
* callbacks. The required future grace periods are recorded in each
- * rcu_node structure's ->need_future_gp[] field. Returns true if there
+ * rcu_node structure's ->gp_seq_needed field. Returns true if there
* is reason to awaken the grace-period kthread.
*
* The caller must hold the specified rcu_node structure's ->lock, which
* is why the caller is responsible for waking the grace-period kthread.
+ *
+ * Returns true if the GP thread needs to be awakened else false.
*/
-static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
- unsigned long c)
+static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
+ unsigned long gp_seq_req)
{
bool ret = false;
struct rcu_state *rsp = rdp->rsp;
- struct rcu_node *rnp_root;
+ struct rcu_node *rnp;
/*
* Use funnel locking to either acquire the root rcu_node
* structure's lock or bail out if the need for this grace period
- * has already been recorded -- or has already started. If there
- * is already a grace period in progress in a non-leaf node, no
- * recording is needed because the end of the grace period will
- * scan the leaf rcu_node structures. Note that rnp->lock must
- * not be released.
+ * has already been recorded -- or if that grace period has in
+ * fact already started. If there is already a grace period in
+ * progress in a non-leaf node, no recording is needed because the
+ * end of the grace period will scan the leaf rcu_node structures.
+ * Note that rnp_start->lock must not be released.
*/
- raw_lockdep_assert_held_rcu_node(rnp);
- trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
- for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
- if (rnp_root != rnp)
- raw_spin_lock_rcu_node(rnp_root);
- WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum +
- need_future_gp_mask(), c));
- if (need_future_gp_element(rnp_root, c) ||
- ULONG_CMP_GE(rnp_root->gpnum, c) ||
- (rnp != rnp_root &&
- rnp_root->gpnum != rnp_root->completed)) {
- trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));
+ raw_lockdep_assert_held_rcu_node(rnp_start);
+ trace_rcu_this_gp(rnp_start, rdp, gp_seq_req, TPS("Startleaf"));
+ for (rnp = rnp_start; 1; rnp = rnp->parent) {
+ if (rnp != rnp_start)
+ raw_spin_lock_rcu_node(rnp);
+ if (ULONG_CMP_GE(rnp->gp_seq_needed, gp_seq_req) ||
+ rcu_seq_started(&rnp->gp_seq, gp_seq_req) ||
+ (rnp != rnp_start &&
+ rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))) {
+ trace_rcu_this_gp(rnp, rdp, gp_seq_req,
+ TPS("Prestarted"));
goto unlock_out;
}
- need_future_gp_element(rnp_root, c) = true;
- if (rnp_root != rnp && rnp_root->parent != NULL)
- raw_spin_unlock_rcu_node(rnp_root);
- if (!rnp_root->parent)
+ rnp->gp_seq_needed = gp_seq_req;
+ if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
+ /*
+ * We just marked the leaf or internal node, and a
+ * grace period is in progress, which means that
+ * rcu_gp_cleanup() will see the marking. Bail to
+ * reduce contention.
+ */
+ trace_rcu_this_gp(rnp_start, rdp, gp_seq_req,
+ TPS("Startedleaf"));
+ goto unlock_out;
+ }
+ if (rnp != rnp_start && rnp->parent != NULL)
+ raw_spin_unlock_rcu_node(rnp);
+ if (!rnp->parent)
break; /* At root, and perhaps also leaf. */
}
/* If GP already in progress, just leave, otherwise start one. */
- if (rnp_root->gpnum != rnp_root->completed) {
- trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedleafroot"));
+ if (rcu_gp_in_progress(rsp)) {
+ trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedleafroot"));
goto unlock_out;
}
- trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedroot"));
+ trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
WRITE_ONCE(rsp->gp_flags, rsp->gp_flags | RCU_GP_FLAG_INIT);
+ rsp->gp_req_activity = jiffies;
if (!rsp->gp_kthread) {
- trace_rcu_this_gp(rnp_root, rdp, c, TPS("NoGPkthread"));
+ trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
goto unlock_out;
}
- trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gpnum), TPS("newreq"));
+ trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq"));
ret = true; /* Caller must wake GP kthread. */
unlock_out:
- if (rnp != rnp_root)
- raw_spin_unlock_rcu_node(rnp_root);
+ /* Push furthest requested GP to leaf node and rcu_data structure. */
+ if (ULONG_CMP_LT(gp_seq_req, rnp->gp_seq_needed)) {
+ rnp_start->gp_seq_needed = rnp->gp_seq_needed;
+ rdp->gp_seq_needed = rnp->gp_seq_needed;
+ }
+ if (rnp != rnp_start)
+ raw_spin_unlock_rcu_node(rnp);
return ret;
}
@@ -1703,13 +1677,13 @@
*/
static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
- unsigned long c = rnp->completed;
bool needmore;
struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
- need_future_gp_element(rnp, c) = false;
- needmore = need_any_future_gp(rnp);
- trace_rcu_this_gp(rnp, rdp, c,
+ needmore = ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed);
+ if (!needmore)
+ rnp->gp_seq_needed = rnp->gp_seq; /* Avoid counter wrap. */
+ trace_rcu_this_gp(rnp, rdp, rnp->gp_seq,
needmore ? TPS("CleanupMore") : TPS("Cleanup"));
return needmore;
}
@@ -1727,25 +1701,25 @@
!READ_ONCE(rsp->gp_flags) ||
!rsp->gp_kthread)
return;
- swake_up(&rsp->gp_wq);
+ swake_up_one(&rsp->gp_wq);
}
/*
- * If there is room, assign a ->completed number to any callbacks on
- * this CPU that have not already been assigned. Also accelerate any
- * callbacks that were previously assigned a ->completed number that has
- * since proven to be too conservative, which can happen if callbacks get
- * assigned a ->completed number while RCU is idle, but with reference to
- * a non-root rcu_node structure. This function is idempotent, so it does
- * not hurt to call it repeatedly. Returns an flag saying that we should
- * awaken the RCU grace-period kthread.
+ * If there is room, assign a ->gp_seq number to any callbacks on this
+ * CPU that have not already been assigned. Also accelerate any callbacks
+ * that were previously assigned a ->gp_seq number that has since proven
+ * to be too conservative, which can happen if callbacks get assigned a
+ * ->gp_seq number while RCU is idle, but with reference to a non-root
+ * rcu_node structure. This function is idempotent, so it does not hurt
+ * to call it repeatedly. Returns an flag saying that we should awaken
+ * the RCU grace-period kthread.
*
* The caller must hold rnp->lock with interrupts disabled.
*/
static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp,
struct rcu_data *rdp)
{
- unsigned long c;
+ unsigned long gp_seq_req;
bool ret = false;
raw_lockdep_assert_held_rcu_node(rnp);
@@ -1764,22 +1738,50 @@
* accelerating callback invocation to an earlier grace-period
* number.
*/
- c = rcu_cbs_completed(rsp, rnp);
- if (rcu_segcblist_accelerate(&rdp->cblist, c))
- ret = rcu_start_this_gp(rnp, rdp, c);
+ gp_seq_req = rcu_seq_snap(&rsp->gp_seq);
+ if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_req))
+ ret = rcu_start_this_gp(rnp, rdp, gp_seq_req);
/* Trace depending on how much we were able to accelerate. */
if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
- trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("AccWaitCB"));
+ trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("AccWaitCB"));
else
- trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("AccReadyCB"));
+ trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("AccReadyCB"));
return ret;
}
/*
+ * Similar to rcu_accelerate_cbs(), but does not require that the leaf
+ * rcu_node structure's ->lock be held. It consults the cached value
+ * of ->gp_seq_needed in the rcu_data structure, and if that indicates
+ * that a new grace-period request be made, invokes rcu_accelerate_cbs()
+ * while holding the leaf rcu_node structure's ->lock.
+ */
+static void rcu_accelerate_cbs_unlocked(struct rcu_state *rsp,
+ struct rcu_node *rnp,
+ struct rcu_data *rdp)
+{
+ unsigned long c;
+ bool needwake;
+
+ lockdep_assert_irqs_disabled();
+ c = rcu_seq_snap(&rsp->gp_seq);
+ if (!rdp->gpwrap && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
+ /* Old request still live, so mark recent callbacks. */
+ (void)rcu_segcblist_accelerate(&rdp->cblist, c);
+ return;
+ }
+ raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
+ needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
+ raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
+ if (needwake)
+ rcu_gp_kthread_wake(rsp);
+}
+
+/*
* Move any callbacks whose grace period has completed to the
* RCU_DONE_TAIL sublist, then compact the remaining sublists and
- * assign ->completed numbers to any callbacks in the RCU_NEXT_TAIL
+ * assign ->gp_seq numbers to any callbacks in the RCU_NEXT_TAIL
* sublist. This function is idempotent, so it does not hurt to
* invoke it repeatedly. As long as it is not invoked -too- often...
* Returns true if the RCU grace-period kthread needs to be awakened.
@@ -1796,10 +1798,10 @@
return false;
/*
- * Find all callbacks whose ->completed numbers indicate that they
+ * Find all callbacks whose ->gp_seq numbers indicate that they
* are ready to invoke, and put them into the RCU_DONE_TAIL sublist.
*/
- rcu_segcblist_advance(&rdp->cblist, rnp->completed);
+ rcu_segcblist_advance(&rdp->cblist, rnp->gp_seq);
/* Classify any remaining callbacks. */
return rcu_accelerate_cbs(rsp, rnp, rdp);
@@ -1819,39 +1821,38 @@
raw_lockdep_assert_held_rcu_node(rnp);
+ if (rdp->gp_seq == rnp->gp_seq)
+ return false; /* Nothing to do. */
+
/* Handle the ends of any preceding grace periods first. */
- if (rdp->completed == rnp->completed &&
- !unlikely(READ_ONCE(rdp->gpwrap))) {
-
- /* No grace period end, so just accelerate recent callbacks. */
- ret = rcu_accelerate_cbs(rsp, rnp, rdp);
-
+ if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) ||
+ unlikely(READ_ONCE(rdp->gpwrap))) {
+ ret = rcu_advance_cbs(rsp, rnp, rdp); /* Advance callbacks. */
+ trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpuend"));
} else {
-
- /* Advance callbacks. */
- ret = rcu_advance_cbs(rsp, rnp, rdp);
-
- /* Remember that we saw this grace-period completion. */
- rdp->completed = rnp->completed;
- trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuend"));
+ ret = rcu_accelerate_cbs(rsp, rnp, rdp); /* Recent callbacks. */
}
- if (rdp->gpnum != rnp->gpnum || unlikely(READ_ONCE(rdp->gpwrap))) {
+ /* Now handle the beginnings of any new-to-this-CPU grace periods. */
+ if (rcu_seq_new_gp(rdp->gp_seq, rnp->gp_seq) ||
+ unlikely(READ_ONCE(rdp->gpwrap))) {
/*
* If the current grace period is waiting for this CPU,
* set up to detect a quiescent state, otherwise don't
* go looking for one.
*/
- rdp->gpnum = rnp->gpnum;
- trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
+ trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart"));
need_gp = !!(rnp->qsmask & rdp->grpmask);
rdp->cpu_no_qs.b.norm = need_gp;
rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr);
rdp->core_needs_qs = need_gp;
zero_cpu_stall_ticks(rdp);
- WRITE_ONCE(rdp->gpwrap, false);
- rcu_gpnum_ovf(rnp, rdp);
}
+ rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
+ if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || rdp->gpwrap)
+ rdp->gp_seq_needed = rnp->gp_seq_needed;
+ WRITE_ONCE(rdp->gpwrap, false);
+ rcu_gpnum_ovf(rnp, rdp);
return ret;
}
@@ -1863,8 +1864,7 @@
local_irq_save(flags);
rnp = rdp->mynode;
- if ((rdp->gpnum == READ_ONCE(rnp->gpnum) &&
- rdp->completed == READ_ONCE(rnp->completed) &&
+ if ((rdp->gp_seq == rcu_seq_current(&rnp->gp_seq) &&
!unlikely(READ_ONCE(rdp->gpwrap))) || /* w/out lock. */
!raw_spin_trylock_rcu_node(rnp)) { /* irqs already off, so later. */
local_irq_restore(flags);
@@ -1879,7 +1879,8 @@
static void rcu_gp_slow(struct rcu_state *rsp, int delay)
{
if (delay > 0 &&
- !(rsp->gpnum % (rcu_num_nodes * PER_RCU_NODE_PERIOD * delay)))
+ !(rcu_seq_ctr(rsp->gp_seq) %
+ (rcu_num_nodes * PER_RCU_NODE_PERIOD * delay)))
schedule_timeout_uninterruptible(delay);
}
@@ -1888,7 +1889,9 @@
*/
static bool rcu_gp_init(struct rcu_state *rsp)
{
+ unsigned long flags;
unsigned long oldmask;
+ unsigned long mask;
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
@@ -1912,9 +1915,9 @@
/* Advance to a new grace period and initialize state. */
record_gp_stall_check_time(rsp);
- /* Record GP times before starting GP, hence smp_store_release(). */
- smp_store_release(&rsp->gpnum, rsp->gpnum + 1);
- trace_rcu_grace_period(rsp->name, rsp->gpnum, TPS("start"));
+ /* Record GP times before starting GP, hence rcu_seq_start(). */
+ rcu_seq_start(&rsp->gp_seq);
+ trace_rcu_grace_period(rsp->name, rsp->gp_seq, TPS("start"));
raw_spin_unlock_irq_rcu_node(rnp);
/*
@@ -1923,13 +1926,15 @@
* for subsequent online CPUs, and that quiescent-state forcing
* will handle subsequent offline CPUs.
*/
+ rsp->gp_state = RCU_GP_ONOFF;
rcu_for_each_leaf_node(rsp, rnp) {
- rcu_gp_slow(rsp, gp_preinit_delay);
+ spin_lock(&rsp->ofl_lock);
raw_spin_lock_irq_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
!rnp->wait_blkd_tasks) {
/* Nothing to do on this leaf rcu_node structure. */
raw_spin_unlock_irq_rcu_node(rnp);
+ spin_unlock(&rsp->ofl_lock);
continue;
}
@@ -1939,12 +1944,14 @@
/* If zero-ness of ->qsmaskinit changed, propagate up tree. */
if (!oldmask != !rnp->qsmaskinit) {
- if (!oldmask) /* First online CPU for this rcu_node. */
- rcu_init_new_rnp(rnp);
- else if (rcu_preempt_has_tasks(rnp)) /* blocked tasks */
- rnp->wait_blkd_tasks = true;
- else /* Last offline CPU and can propagate. */
+ if (!oldmask) { /* First online CPU for rcu_node. */
+ if (!rnp->wait_blkd_tasks) /* Ever offline? */
+ rcu_init_new_rnp(rnp);
+ } else if (rcu_preempt_has_tasks(rnp)) {
+ rnp->wait_blkd_tasks = true; /* blocked tasks */
+ } else { /* Last offline CPU and can propagate. */
rcu_cleanup_dead_rnp(rnp);
+ }
}
/*
@@ -1953,18 +1960,19 @@
* still offline, propagate up the rcu_node tree and
* clear ->wait_blkd_tasks. Otherwise, if one of this
* rcu_node structure's CPUs has since come back online,
- * simply clear ->wait_blkd_tasks (but rcu_cleanup_dead_rnp()
- * checks for this, so just call it unconditionally).
+ * simply clear ->wait_blkd_tasks.
*/
if (rnp->wait_blkd_tasks &&
- (!rcu_preempt_has_tasks(rnp) ||
- rnp->qsmaskinit)) {
+ (!rcu_preempt_has_tasks(rnp) || rnp->qsmaskinit)) {
rnp->wait_blkd_tasks = false;
- rcu_cleanup_dead_rnp(rnp);
+ if (!rnp->qsmaskinit)
+ rcu_cleanup_dead_rnp(rnp);
}
raw_spin_unlock_irq_rcu_node(rnp);
+ spin_unlock(&rsp->ofl_lock);
}
+ rcu_gp_slow(rsp, gp_preinit_delay); /* Races with CPU hotplug. */
/*
* Set the quiescent-state-needed bits in all the rcu_node
@@ -1978,22 +1986,27 @@
* The grace period cannot complete until the initialization
* process finishes, because this kthread handles both.
*/
+ rsp->gp_state = RCU_GP_INIT;
rcu_for_each_node_breadth_first(rsp, rnp) {
rcu_gp_slow(rsp, gp_init_delay);
- raw_spin_lock_irq_rcu_node(rnp);
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
rdp = this_cpu_ptr(rsp->rda);
- rcu_preempt_check_blocked_tasks(rnp);
+ rcu_preempt_check_blocked_tasks(rsp, rnp);
rnp->qsmask = rnp->qsmaskinit;
- WRITE_ONCE(rnp->gpnum, rsp->gpnum);
- if (WARN_ON_ONCE(rnp->completed != rsp->completed))
- WRITE_ONCE(rnp->completed, rsp->completed);
+ WRITE_ONCE(rnp->gp_seq, rsp->gp_seq);
if (rnp == rdp->mynode)
(void)__note_gp_changes(rsp, rnp, rdp);
rcu_preempt_boost_start_gp(rnp);
- trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
+ trace_rcu_grace_period_init(rsp->name, rnp->gp_seq,
rnp->level, rnp->grplo,
rnp->grphi, rnp->qsmask);
- raw_spin_unlock_irq_rcu_node(rnp);
+ /* Quiescent states for tasks on any now-offline CPUs. */
+ mask = rnp->qsmask & ~rnp->qsmaskinitnext;
+ rnp->rcu_gp_init_mask = mask;
+ if ((mask || rnp->wait_blkd_tasks) && rcu_is_leaf_node(rnp))
+ rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
+ else
+ raw_spin_unlock_irq_rcu_node(rnp);
cond_resched_tasks_rcu_qs();
WRITE_ONCE(rsp->gp_activity, jiffies);
}
@@ -2002,7 +2015,7 @@
}
/*
- * Helper function for swait_event_idle() wakeup at force-quiescent-state
+ * Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state
* time.
*/
static bool rcu_gp_fqs_check_wake(struct rcu_state *rsp, int *gfp)
@@ -2053,6 +2066,7 @@
{
unsigned long gp_duration;
bool needgp = false;
+ unsigned long new_gp_seq;
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
struct swait_queue_head *sq;
@@ -2074,19 +2088,22 @@
raw_spin_unlock_irq_rcu_node(rnp);
/*
- * Propagate new ->completed value to rcu_node structures so
- * that other CPUs don't have to wait until the start of the next
- * grace period to process their callbacks. This also avoids
- * some nasty RCU grace-period initialization races by forcing
- * the end of the current grace period to be completely recorded in
- * all of the rcu_node structures before the beginning of the next
- * grace period is recorded in any of the rcu_node structures.
+ * Propagate new ->gp_seq value to rcu_node structures so that
+ * other CPUs don't have to wait until the start of the next grace
+ * period to process their callbacks. This also avoids some nasty
+ * RCU grace-period initialization races by forcing the end of
+ * the current grace period to be completely recorded in all of
+ * the rcu_node structures before the beginning of the next grace
+ * period is recorded in any of the rcu_node structures.
*/
+ new_gp_seq = rsp->gp_seq;
+ rcu_seq_end(&new_gp_seq);
rcu_for_each_node_breadth_first(rsp, rnp) {
raw_spin_lock_irq_rcu_node(rnp);
- WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
+ if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)))
+ dump_blkd_tasks(rsp, rnp, 10);
WARN_ON_ONCE(rnp->qsmask);
- WRITE_ONCE(rnp->completed, rsp->gpnum);
+ WRITE_ONCE(rnp->gp_seq, new_gp_seq);
rdp = this_cpu_ptr(rsp->rda);
if (rnp == rdp->mynode)
needgp = __note_gp_changes(rsp, rnp, rdp) || needgp;
@@ -2100,26 +2117,28 @@
rcu_gp_slow(rsp, gp_cleanup_delay);
}
rnp = rcu_get_root(rsp);
- raw_spin_lock_irq_rcu_node(rnp); /* Order GP before ->completed update. */
+ raw_spin_lock_irq_rcu_node(rnp); /* GP before rsp->gp_seq update. */
/* Declare grace period done. */
- WRITE_ONCE(rsp->completed, rsp->gpnum);
- trace_rcu_grace_period(rsp->name, rsp->completed, TPS("end"));
+ rcu_seq_end(&rsp->gp_seq);
+ trace_rcu_grace_period(rsp->name, rsp->gp_seq, TPS("end"));
rsp->gp_state = RCU_GP_IDLE;
/* Check for GP requests since above loop. */
rdp = this_cpu_ptr(rsp->rda);
- if (need_any_future_gp(rnp)) {
- trace_rcu_this_gp(rnp, rdp, rsp->completed - 1,
+ if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
+ trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed,
TPS("CleanupMore"));
needgp = true;
}
/* Advance CBs to reduce false positives below. */
if (!rcu_accelerate_cbs(rsp, rnp, rdp) && needgp) {
WRITE_ONCE(rsp->gp_flags, RCU_GP_FLAG_INIT);
- trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gpnum),
+ rsp->gp_req_activity = jiffies;
+ trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq),
TPS("newreq"));
+ } else {
+ WRITE_ONCE(rsp->gp_flags, rsp->gp_flags & RCU_GP_FLAG_INIT);
}
- WRITE_ONCE(rsp->gp_flags, rsp->gp_flags & RCU_GP_FLAG_INIT);
raw_spin_unlock_irq_rcu_node(rnp);
}
@@ -2141,10 +2160,10 @@
/* Handle grace-period start. */
for (;;) {
trace_rcu_grace_period(rsp->name,
- READ_ONCE(rsp->gpnum),
+ READ_ONCE(rsp->gp_seq),
TPS("reqwait"));
rsp->gp_state = RCU_GP_WAIT_GPS;
- swait_event_idle(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
+ swait_event_idle_exclusive(rsp->gp_wq, READ_ONCE(rsp->gp_flags) &
RCU_GP_FLAG_INIT);
rsp->gp_state = RCU_GP_DONE_GPS;
/* Locking provides needed memory barrier. */
@@ -2154,17 +2173,13 @@
WRITE_ONCE(rsp->gp_activity, jiffies);
WARN_ON(signal_pending(current));
trace_rcu_grace_period(rsp->name,
- READ_ONCE(rsp->gpnum),
+ READ_ONCE(rsp->gp_seq),
TPS("reqwaitsig"));
}
/* Handle quiescent-state forcing. */
first_gp_fqs = true;
j = jiffies_till_first_fqs;
- if (j > HZ) {
- j = HZ;
- jiffies_till_first_fqs = HZ;
- }
ret = 0;
for (;;) {
if (!ret) {
@@ -2173,10 +2188,10 @@
jiffies + 3 * j);
}
trace_rcu_grace_period(rsp->name,
- READ_ONCE(rsp->gpnum),
+ READ_ONCE(rsp->gp_seq),
TPS("fqswait"));
rsp->gp_state = RCU_GP_WAIT_FQS;
- ret = swait_event_idle_timeout(rsp->gp_wq,
+ ret = swait_event_idle_timeout_exclusive(rsp->gp_wq,
rcu_gp_fqs_check_wake(rsp, &gf), j);
rsp->gp_state = RCU_GP_DOING_FQS;
/* Locking provides needed memory barriers. */
@@ -2188,31 +2203,24 @@
if (ULONG_CMP_GE(jiffies, rsp->jiffies_force_qs) ||
(gf & RCU_GP_FLAG_FQS)) {
trace_rcu_grace_period(rsp->name,
- READ_ONCE(rsp->gpnum),
+ READ_ONCE(rsp->gp_seq),
TPS("fqsstart"));
rcu_gp_fqs(rsp, first_gp_fqs);
first_gp_fqs = false;
trace_rcu_grace_period(rsp->name,
- READ_ONCE(rsp->gpnum),
+ READ_ONCE(rsp->gp_seq),
TPS("fqsend"));
cond_resched_tasks_rcu_qs();
WRITE_ONCE(rsp->gp_activity, jiffies);
ret = 0; /* Force full wait till next FQS. */
j = jiffies_till_next_fqs;
- if (j > HZ) {
- j = HZ;
- jiffies_till_next_fqs = HZ;
- } else if (j < 1) {
- j = 1;
- jiffies_till_next_fqs = 1;
- }
} else {
/* Deal with stray signal. */
cond_resched_tasks_rcu_qs();
WRITE_ONCE(rsp->gp_activity, jiffies);
WARN_ON(signal_pending(current));
trace_rcu_grace_period(rsp->name,
- READ_ONCE(rsp->gpnum),
+ READ_ONCE(rsp->gp_seq),
TPS("fqswaitsig"));
ret = 1; /* Keep old FQS timing. */
j = jiffies;
@@ -2256,8 +2264,12 @@
* must be represented by the same rcu_node structure (which need not be a
* leaf rcu_node structure, though it often will be). The gps parameter
* is the grace-period snapshot, which means that the quiescent states
- * are valid only if rnp->gpnum is equal to gps. That structure's lock
+ * are valid only if rnp->gp_seq is equal to gps. That structure's lock
* must be held upon entry, and it is released before return.
+ *
+ * As a special case, if mask is zero, the bit-already-cleared check is
+ * disabled. This allows propagating quiescent state due to resumed tasks
+ * during grace-period initialization.
*/
static void
rcu_report_qs_rnp(unsigned long mask, struct rcu_state *rsp,
@@ -2271,7 +2283,7 @@
/* Walk up the rcu_node hierarchy. */
for (;;) {
- if (!(rnp->qsmask & mask) || rnp->gpnum != gps) {
+ if ((!(rnp->qsmask & mask) && mask) || rnp->gp_seq != gps) {
/*
* Our bit has already been cleared, or the
@@ -2284,7 +2296,7 @@
WARN_ON_ONCE(!rcu_is_leaf_node(rnp) &&
rcu_preempt_blocked_readers_cgp(rnp));
rnp->qsmask &= ~mask;
- trace_rcu_quiescent_state_report(rsp->name, rnp->gpnum,
+ trace_rcu_quiescent_state_report(rsp->name, rnp->gp_seq,
mask, rnp->qsmask, rnp->level,
rnp->grplo, rnp->grphi,
!!rnp->gp_tasks);
@@ -2294,6 +2306,7 @@
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return;
}
+ rnp->completedqs = rnp->gp_seq;
mask = rnp->grpmask;
if (rnp->parent == NULL) {
@@ -2323,8 +2336,9 @@
* irqs disabled, and this lock is released upon return, but irqs remain
* disabled.
*/
-static void rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
- struct rcu_node *rnp, unsigned long flags)
+static void __maybe_unused
+rcu_report_unblock_qs_rnp(struct rcu_state *rsp,
+ struct rcu_node *rnp, unsigned long flags)
__releases(rnp->lock)
{
unsigned long gps;
@@ -2332,12 +2346,15 @@
struct rcu_node *rnp_p;
raw_lockdep_assert_held_rcu_node(rnp);
- if (rcu_state_p == &rcu_sched_state || rsp != rcu_state_p ||
- rnp->qsmask != 0 || rcu_preempt_blocked_readers_cgp(rnp)) {
+ if (WARN_ON_ONCE(rcu_state_p == &rcu_sched_state) ||
+ WARN_ON_ONCE(rsp != rcu_state_p) ||
+ WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)) ||
+ rnp->qsmask != 0) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return; /* Still need more quiescent states! */
}
+ rnp->completedqs = rnp->gp_seq;
rnp_p = rnp->parent;
if (rnp_p == NULL) {
/*
@@ -2348,8 +2365,8 @@
return;
}
- /* Report up the rest of the hierarchy, tracking current ->gpnum. */
- gps = rnp->gpnum;
+ /* Report up the rest of the hierarchy, tracking current ->gp_seq. */
+ gps = rnp->gp_seq;
mask = rnp->grpmask;
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
raw_spin_lock_rcu_node(rnp_p); /* irqs already disabled. */
@@ -2370,8 +2387,8 @@
rnp = rdp->mynode;
raw_spin_lock_irqsave_rcu_node(rnp, flags);
- if (rdp->cpu_no_qs.b.norm || rdp->gpnum != rnp->gpnum ||
- rnp->completed == rnp->gpnum || rdp->gpwrap) {
+ if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
+ rdp->gpwrap) {
/*
* The grace period in which this quiescent state was
@@ -2396,7 +2413,7 @@
*/
needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
- rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
+ rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
/* ^^^ Released rnp->lock */
if (needwake)
rcu_gp_kthread_wake(rsp);
@@ -2441,17 +2458,16 @@
*/
static void rcu_cleanup_dying_cpu(struct rcu_state *rsp)
{
- RCU_TRACE(unsigned long mask;)
+ RCU_TRACE(bool blkd;)
RCU_TRACE(struct rcu_data *rdp = this_cpu_ptr(rsp->rda);)
RCU_TRACE(struct rcu_node *rnp = rdp->mynode;)
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
return;
- RCU_TRACE(mask = rdp->grpmask;)
- trace_rcu_grace_period(rsp->name,
- rnp->gpnum + 1 - !!(rnp->qsmask & mask),
- TPS("cpuofl"));
+ RCU_TRACE(blkd = !!(rnp->qsmask & rdp->grpmask);)
+ trace_rcu_grace_period(rsp->name, rnp->gp_seq,
+ blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
}
/*
@@ -2463,7 +2479,7 @@
* This function therefore goes up the tree of rcu_node structures,
* clearing the corresponding bits in the ->qsmaskinit fields. Note that
* the leaf rcu_node structure's ->qsmaskinit field has already been
- * updated
+ * updated.
*
* This function does check that the specified rcu_node structure has
* all CPUs offline and no blocked tasks, so it is OK to invoke it
@@ -2476,9 +2492,10 @@
long mask;
struct rcu_node *rnp = rnp_leaf;
- raw_lockdep_assert_held_rcu_node(rnp);
+ raw_lockdep_assert_held_rcu_node(rnp_leaf);
if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) ||
- rnp->qsmaskinit || rcu_preempt_has_tasks(rnp))
+ WARN_ON_ONCE(rnp_leaf->qsmaskinit) ||
+ WARN_ON_ONCE(rcu_preempt_has_tasks(rnp_leaf)))
return;
for (;;) {
mask = rnp->grpmask;
@@ -2487,7 +2504,8 @@
break;
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
rnp->qsmaskinit &= ~mask;
- rnp->qsmask &= ~mask;
+ /* Between grace periods, so better already be zero! */
+ WARN_ON_ONCE(rnp->qsmask);
if (rnp->qsmaskinit) {
raw_spin_unlock_rcu_node(rnp);
/* irqs remain disabled. */
@@ -2630,6 +2648,7 @@
rcu_sched_qs();
rcu_bh_qs();
+ rcu_note_voluntary_context_switch(current);
} else if (!in_softirq()) {
@@ -2645,8 +2664,7 @@
rcu_preempt_check_callbacks();
if (rcu_pending())
invoke_rcu_core();
- if (user)
- rcu_note_voluntary_context_switch(current);
+
trace_rcu_utilization(TPS("End scheduler-tick"));
}
@@ -2681,17 +2699,8 @@
/* rcu_initiate_boost() releases rnp->lock */
continue;
}
- if (rnp->parent &&
- (rnp->parent->qsmask & rnp->grpmask)) {
- /*
- * Race between grace-period
- * initialization and task exiting RCU
- * read-side critical section: Report.
- */
- rcu_report_unblock_qs_rnp(rsp, rnp, flags);
- /* rcu_report_unblock_qs_rnp() rlses ->lock */
- continue;
- }
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ continue;
}
for_each_leaf_node_possible_cpu(rnp, cpu) {
unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
@@ -2701,8 +2710,8 @@
}
}
if (mask != 0) {
- /* Idle/offline CPUs, report (releases rnp->lock. */
- rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
+ /* Idle/offline CPUs, report (releases rnp->lock). */
+ rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
} else {
/* Nothing to do here, so just drop the lock. */
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
@@ -2747,6 +2756,65 @@
}
/*
+ * This function checks for grace-period requests that fail to motivate
+ * RCU to come out of its idle mode.
+ */
+static void
+rcu_check_gp_start_stall(struct rcu_state *rsp, struct rcu_node *rnp,
+ struct rcu_data *rdp)
+{
+ const unsigned long gpssdelay = rcu_jiffies_till_stall_check() * HZ;
+ unsigned long flags;
+ unsigned long j;
+ struct rcu_node *rnp_root = rcu_get_root(rsp);
+ static atomic_t warned = ATOMIC_INIT(0);
+
+ if (!IS_ENABLED(CONFIG_PROVE_RCU) || rcu_gp_in_progress(rsp) ||
+ ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed))
+ return;
+ j = jiffies; /* Expensive access, and in common case don't get here. */
+ if (time_before(j, READ_ONCE(rsp->gp_req_activity) + gpssdelay) ||
+ time_before(j, READ_ONCE(rsp->gp_activity) + gpssdelay) ||
+ atomic_read(&warned))
+ return;
+
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ j = jiffies;
+ if (rcu_gp_in_progress(rsp) ||
+ ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed) ||
+ time_before(j, READ_ONCE(rsp->gp_req_activity) + gpssdelay) ||
+ time_before(j, READ_ONCE(rsp->gp_activity) + gpssdelay) ||
+ atomic_read(&warned)) {
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ return;
+ }
+ /* Hold onto the leaf lock to make others see warned==1. */
+
+ if (rnp_root != rnp)
+ raw_spin_lock_rcu_node(rnp_root); /* irqs already disabled. */
+ j = jiffies;
+ if (rcu_gp_in_progress(rsp) ||
+ ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed) ||
+ time_before(j, rsp->gp_req_activity + gpssdelay) ||
+ time_before(j, rsp->gp_activity + gpssdelay) ||
+ atomic_xchg(&warned, 1)) {
+ raw_spin_unlock_rcu_node(rnp_root); /* irqs remain disabled. */
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ return;
+ }
+ pr_alert("%s: g%ld->%ld gar:%lu ga:%lu f%#x gs:%d %s->state:%#lx\n",
+ __func__, (long)READ_ONCE(rsp->gp_seq),
+ (long)READ_ONCE(rnp_root->gp_seq_needed),
+ j - rsp->gp_req_activity, j - rsp->gp_activity,
+ rsp->gp_flags, rsp->gp_state, rsp->name,
+ rsp->gp_kthread ? rsp->gp_kthread->state : 0x1ffffL);
+ WARN_ON(1);
+ if (rnp_root != rnp)
+ raw_spin_unlock_rcu_node(rnp_root);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+}
+
+/*
* This does the RCU core processing work for the specified rcu_state
* and rcu_data structures. This may be called only from the CPU to
* whom the rdp belongs.
@@ -2755,9 +2823,8 @@
__rcu_process_callbacks(struct rcu_state *rsp)
{
unsigned long flags;
- bool needwake;
struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
- struct rcu_node *rnp;
+ struct rcu_node *rnp = rdp->mynode;
WARN_ON_ONCE(!rdp->beenonline);
@@ -2768,18 +2835,13 @@
if (!rcu_gp_in_progress(rsp) &&
rcu_segcblist_is_enabled(&rdp->cblist)) {
local_irq_save(flags);
- if (rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) {
- local_irq_restore(flags);
- } else {
- rnp = rdp->mynode;
- raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
- needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
- raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
- if (needwake)
- rcu_gp_kthread_wake(rsp);
- }
+ if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
+ rcu_accelerate_cbs_unlocked(rsp, rnp, rdp);
+ local_irq_restore(flags);
}
+ rcu_check_gp_start_stall(rsp, rnp, rdp);
+
/* If there are callbacks ready, invoke them. */
if (rcu_segcblist_ready_cbs(&rdp->cblist))
invoke_rcu_callbacks(rsp, rdp);
@@ -2833,8 +2895,6 @@
static void __call_rcu_core(struct rcu_state *rsp, struct rcu_data *rdp,
struct rcu_head *head, unsigned long flags)
{
- bool needwake;
-
/*
* If called from an extended quiescent state, invoke the RCU
* core in order to force a re-evaluation of RCU's idleness.
@@ -2861,13 +2921,7 @@
/* Start a new grace period if one not already started. */
if (!rcu_gp_in_progress(rsp)) {
- struct rcu_node *rnp = rdp->mynode;
-
- raw_spin_lock_rcu_node(rnp);
- needwake = rcu_accelerate_cbs(rsp, rnp, rdp);
- raw_spin_unlock_rcu_node(rnp);
- if (needwake)
- rcu_gp_kthread_wake(rsp);
+ rcu_accelerate_cbs_unlocked(rsp, rdp->mynode, rdp);
} else {
/* Give the grace period a kick. */
rdp->blimit = LONG_MAX;
@@ -3037,7 +3091,7 @@
* when there was in fact only one the whole time, as this just adds
* some overhead: RCU still operates correctly.
*/
-static inline int rcu_blocking_is_gp(void)
+static int rcu_blocking_is_gp(void)
{
int ret;
@@ -3136,16 +3190,10 @@
{
/*
* Any prior manipulation of RCU-protected data must happen
- * before the load from ->gpnum.
+ * before the load from ->gp_seq.
*/
smp_mb(); /* ^^^ */
-
- /*
- * Make sure this load happens before the purportedly
- * time-consuming work between get_state_synchronize_rcu()
- * and cond_synchronize_rcu().
- */
- return smp_load_acquire(&rcu_state_p->gpnum);
+ return rcu_seq_snap(&rcu_state_p->gp_seq);
}
EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
@@ -3165,15 +3213,10 @@
*/
void cond_synchronize_rcu(unsigned long oldstate)
{
- unsigned long newstate;
-
- /*
- * Ensure that this load happens before any RCU-destructive
- * actions the caller might carry out after we return.
- */
- newstate = smp_load_acquire(&rcu_state_p->completed);
- if (ULONG_CMP_GE(oldstate, newstate))
+ if (!rcu_seq_done(&rcu_state_p->gp_seq, oldstate))
synchronize_rcu();
+ else
+ smp_mb(); /* Ensure GP ends before subsequent accesses. */
}
EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
@@ -3188,16 +3231,10 @@
{
/*
* Any prior manipulation of RCU-protected data must happen
- * before the load from ->gpnum.
+ * before the load from ->gp_seq.
*/
smp_mb(); /* ^^^ */
-
- /*
- * Make sure this load happens before the purportedly
- * time-consuming work between get_state_synchronize_sched()
- * and cond_synchronize_sched().
- */
- return smp_load_acquire(&rcu_sched_state.gpnum);
+ return rcu_seq_snap(&rcu_sched_state.gp_seq);
}
EXPORT_SYMBOL_GPL(get_state_synchronize_sched);
@@ -3217,15 +3254,10 @@
*/
void cond_synchronize_sched(unsigned long oldstate)
{
- unsigned long newstate;
-
- /*
- * Ensure that this load happens before any RCU-destructive
- * actions the caller might carry out after we return.
- */
- newstate = smp_load_acquire(&rcu_sched_state.completed);
- if (ULONG_CMP_GE(oldstate, newstate))
+ if (!rcu_seq_done(&rcu_sched_state.gp_seq, oldstate))
synchronize_sched();
+ else
+ smp_mb(); /* Ensure GP ends before subsequent accesses. */
}
EXPORT_SYMBOL_GPL(cond_synchronize_sched);
@@ -3261,12 +3293,8 @@
!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
return 1;
- /* Has another RCU grace period completed? */
- if (READ_ONCE(rnp->completed) != rdp->completed) /* outside lock */
- return 1;
-
- /* Has a new RCU grace period started? */
- if (READ_ONCE(rnp->gpnum) != rdp->gpnum ||
+ /* Have RCU grace period completed or started? */
+ if (rcu_seq_current(&rnp->gp_seq) != rdp->gp_seq ||
unlikely(READ_ONCE(rdp->gpwrap))) /* outside lock */
return 1;
@@ -3298,7 +3326,7 @@
* non-NULL, store an indication of whether all callbacks are lazy.
* (If there are no callbacks, all of them are deemed to be lazy.)
*/
-static bool __maybe_unused rcu_cpu_has_callbacks(bool *all_lazy)
+static bool rcu_cpu_has_callbacks(bool *all_lazy)
{
bool al = true;
bool hc = false;
@@ -3484,17 +3512,22 @@
static void rcu_init_new_rnp(struct rcu_node *rnp_leaf)
{
long mask;
+ long oldmask;
struct rcu_node *rnp = rnp_leaf;
- raw_lockdep_assert_held_rcu_node(rnp);
+ raw_lockdep_assert_held_rcu_node(rnp_leaf);
+ WARN_ON_ONCE(rnp->wait_blkd_tasks);
for (;;) {
mask = rnp->grpmask;
rnp = rnp->parent;
if (rnp == NULL)
return;
raw_spin_lock_rcu_node(rnp); /* Interrupts already disabled. */
+ oldmask = rnp->qsmaskinit;
rnp->qsmaskinit |= mask;
raw_spin_unlock_rcu_node(rnp); /* Interrupts remain disabled. */
+ if (oldmask)
+ return;
}
}
@@ -3511,6 +3544,10 @@
rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != 1);
WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp->dynticks)));
+ rdp->rcu_ofl_gp_seq = rsp->gp_seq;
+ rdp->rcu_ofl_gp_flags = RCU_GP_CLEANED;
+ rdp->rcu_onl_gp_seq = rsp->gp_seq;
+ rdp->rcu_onl_gp_flags = RCU_GP_CLEANED;
rdp->cpu = cpu;
rdp->rsp = rsp;
rcu_boot_init_nocb_percpu_data(rdp);
@@ -3518,9 +3555,9 @@
/*
* Initialize a CPU's per-CPU RCU data. Note that only one online or
- * offline event can be happening at a given time. Note also that we
- * can accept some slop in the rsp->completed access due to the fact
- * that this CPU cannot possibly have any RCU callbacks in flight yet.
+ * offline event can be happening at a given time. Note also that we can
+ * accept some slop in the rsp->gp_seq access due to the fact that this
+ * CPU cannot possibly have any RCU callbacks in flight yet.
*/
static void
rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
@@ -3549,14 +3586,14 @@
rnp = rdp->mynode;
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
rdp->beenonline = true; /* We have now been online. */
- rdp->gpnum = rnp->completed; /* Make CPU later note any new GP. */
- rdp->completed = rnp->completed;
+ rdp->gp_seq = rnp->gp_seq;
+ rdp->gp_seq_needed = rnp->gp_seq;
rdp->cpu_no_qs.b.norm = true;
rdp->rcu_qs_ctr_snap = per_cpu(rcu_dynticks.rcu_qs_ctr, cpu);
rdp->core_needs_qs = false;
rdp->rcu_iw_pending = false;
- rdp->rcu_iw_gpnum = rnp->gpnum - 1;
- trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpuonl"));
+ rdp->rcu_iw_gp_seq = rnp->gp_seq - 1;
+ trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpuonl"));
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
@@ -3705,7 +3742,15 @@
nbits = bitmap_weight(&oldmask, BITS_PER_LONG);
/* Allow lockless access for expedited grace periods. */
smp_store_release(&rsp->ncpus, rsp->ncpus + nbits); /* ^^^ */
- raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
+ rdp->rcu_onl_gp_seq = READ_ONCE(rsp->gp_seq);
+ rdp->rcu_onl_gp_flags = READ_ONCE(rsp->gp_flags);
+ if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
+ /* Report QS -after- changing ->qsmaskinitnext! */
+ rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
+ } else {
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ }
}
smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
}
@@ -3713,7 +3758,7 @@
#ifdef CONFIG_HOTPLUG_CPU
/*
* The CPU is exiting the idle loop into the arch_cpu_idle_dead()
- * function. We now remove it from the rcu_node tree's ->qsmaskinit
+ * function. We now remove it from the rcu_node tree's ->qsmaskinitnext
* bit masks.
*/
static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
@@ -3725,9 +3770,18 @@
/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
mask = rdp->grpmask;
+ spin_lock(&rsp->ofl_lock);
raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
+ rdp->rcu_ofl_gp_seq = READ_ONCE(rsp->gp_seq);
+ rdp->rcu_ofl_gp_flags = READ_ONCE(rsp->gp_flags);
+ if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
+ /* Report quiescent state -before- changing ->qsmaskinitnext! */
+ rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ }
rnp->qsmaskinitnext &= ~mask;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ spin_unlock(&rsp->ofl_lock);
}
/*
@@ -3839,12 +3893,16 @@
struct task_struct *t;
/* Force priority into range. */
- if (IS_ENABLED(CONFIG_RCU_BOOST) && kthread_prio < 1)
+ if (IS_ENABLED(CONFIG_RCU_BOOST) && kthread_prio < 2
+ && IS_BUILTIN(CONFIG_RCU_TORTURE_TEST))
+ kthread_prio = 2;
+ else if (IS_ENABLED(CONFIG_RCU_BOOST) && kthread_prio < 1)
kthread_prio = 1;
else if (kthread_prio < 0)
kthread_prio = 0;
else if (kthread_prio > 99)
kthread_prio = 99;
+
if (kthread_prio != kthread_prio_in)
pr_alert("rcu_spawn_gp_kthread(): Limited prio to %d from %d\n",
kthread_prio, kthread_prio_in);
@@ -3928,8 +3986,9 @@
raw_spin_lock_init(&rnp->fqslock);
lockdep_set_class_and_name(&rnp->fqslock,
&rcu_fqs_class[i], fqs[i]);
- rnp->gpnum = rsp->gpnum;
- rnp->completed = rsp->completed;
+ rnp->gp_seq = rsp->gp_seq;
+ rnp->gp_seq_needed = rsp->gp_seq;
+ rnp->completedqs = rsp->gp_seq;
rnp->qsmask = 0;
rnp->qsmaskinit = 0;
rnp->grplo = j * cpustride;
@@ -3997,7 +4056,7 @@
if (rcu_fanout_leaf == RCU_FANOUT_LEAF &&
nr_cpu_ids == NR_CPUS)
return;
- pr_info("RCU: Adjusting geometry for rcu_fanout_leaf=%d, nr_cpu_ids=%u\n",
+ pr_info("Adjusting geometry for rcu_fanout_leaf=%d, nr_cpu_ids=%u\n",
rcu_fanout_leaf, nr_cpu_ids);
/*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 78e051d..4e74df7 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -81,18 +81,16 @@
raw_spinlock_t __private lock; /* Root rcu_node's lock protects */
/* some rcu_state fields as well as */
/* following. */
- unsigned long gpnum; /* Current grace period for this node. */
- /* This will either be equal to or one */
- /* behind the root rcu_node's gpnum. */
- unsigned long completed; /* Last GP completed for this node. */
- /* This will either be equal to or one */
- /* behind the root rcu_node's gpnum. */
+ unsigned long gp_seq; /* Track rsp->rcu_gp_seq. */
+ unsigned long gp_seq_needed; /* Track rsp->rcu_gp_seq_needed. */
+ unsigned long completedqs; /* All QSes done for this node. */
unsigned long qsmask; /* CPUs or groups that need to switch in */
/* order for current grace period to proceed.*/
/* In leaf rcu_node, each bit corresponds to */
/* an rcu_data structure, otherwise, each */
/* bit corresponds to a child rcu_node */
/* structure. */
+ unsigned long rcu_gp_init_mask; /* Mask of offline CPUs at GP init. */
unsigned long qsmaskinit;
/* Per-GP initial value for qsmask. */
/* Initialized from ->qsmaskinitnext at the */
@@ -158,7 +156,6 @@
struct swait_queue_head nocb_gp_wq[2];
/* Place for rcu_nocb_kthread() to wait GP. */
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
- u8 need_future_gp[4]; /* Counts of upcoming GP requests. */
raw_spinlock_t fqslock ____cacheline_internodealigned_in_smp;
spinlock_t exp_lock ____cacheline_internodealigned_in_smp;
@@ -168,22 +165,6 @@
bool exp_need_flush; /* Need to flush workitem? */
} ____cacheline_internodealigned_in_smp;
-/* Accessors for ->need_future_gp[] array. */
-#define need_future_gp_mask() \
- (ARRAY_SIZE(((struct rcu_node *)NULL)->need_future_gp) - 1)
-#define need_future_gp_element(rnp, c) \
- ((rnp)->need_future_gp[(c) & need_future_gp_mask()])
-#define need_any_future_gp(rnp) \
-({ \
- int __i; \
- bool __nonzero = false; \
- \
- for (__i = 0; __i < ARRAY_SIZE((rnp)->need_future_gp); __i++) \
- __nonzero = __nonzero || \
- READ_ONCE((rnp)->need_future_gp[__i]); \
- __nonzero; \
-})
-
/*
* Bitmasks in an rcu_node cover the interval [grplo, grphi] of CPU IDs, and
* are indexed relative to this interval rather than the global CPU ID space.
@@ -206,16 +187,14 @@
/* Per-CPU data for read-copy update. */
struct rcu_data {
/* 1) quiescent-state and grace-period handling : */
- unsigned long completed; /* Track rsp->completed gp number */
- /* in order to detect GP end. */
- unsigned long gpnum; /* Highest gp number that this CPU */
- /* is aware of having started. */
+ unsigned long gp_seq; /* Track rsp->rcu_gp_seq counter. */
+ unsigned long gp_seq_needed; /* Track rsp->rcu_gp_seq_needed ctr. */
unsigned long rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
/* for rcu_all_qs() invocations. */
union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */
bool core_needs_qs; /* Core waits for quiesc state. */
bool beenonline; /* CPU online at least once. */
- bool gpwrap; /* Possible gpnum/completed wrap. */
+ bool gpwrap; /* Possible ->gp_seq wrap. */
struct rcu_node *mynode; /* This CPU's leaf of hierarchy */
unsigned long grpmask; /* Mask to apply to leaf qsmask. */
unsigned long ticks_this_gp; /* The number of scheduling-clock */
@@ -239,7 +218,6 @@
/* 4) reasons this CPU needed to be kicked by force_quiescent_state */
unsigned long dynticks_fqs; /* Kicked due to dynticks idle. */
- unsigned long offline_fqs; /* Kicked due to being offline. */
unsigned long cond_resched_completed;
/* Grace period that needs help */
/* from cond_resched(). */
@@ -278,12 +256,16 @@
/* Leader CPU takes GP-end wakeups. */
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
- /* 7) RCU CPU stall data. */
+ /* 7) Diagnostic data, including RCU CPU stall warnings. */
unsigned int softirq_snap; /* Snapshot of softirq activity. */
/* ->rcu_iw* fields protected by leaf rcu_node ->lock. */
struct irq_work rcu_iw; /* Check for non-irq activity. */
bool rcu_iw_pending; /* Is ->rcu_iw pending? */
- unsigned long rcu_iw_gpnum; /* ->gpnum associated with ->rcu_iw. */
+ unsigned long rcu_iw_gp_seq; /* ->gp_seq associated with ->rcu_iw. */
+ unsigned long rcu_ofl_gp_seq; /* ->gp_seq at last offline. */
+ short rcu_ofl_gp_flags; /* ->gp_flags at last offline. */
+ unsigned long rcu_onl_gp_seq; /* ->gp_seq at last online. */
+ short rcu_onl_gp_flags; /* ->gp_flags at last online. */
int cpu;
struct rcu_state *rsp;
@@ -340,8 +322,7 @@
u8 boost ____cacheline_internodealigned_in_smp;
/* Subject to priority boost. */
- unsigned long gpnum; /* Current gp number. */
- unsigned long completed; /* # of last completed gp. */
+ unsigned long gp_seq; /* Grace-period sequence #. */
struct task_struct *gp_kthread; /* Task for grace periods. */
struct swait_queue_head gp_wq; /* Where GP task waits. */
short gp_flags; /* Commands for GP task. */
@@ -373,6 +354,8 @@
/* but in jiffies. */
unsigned long gp_activity; /* Time of last GP kthread */
/* activity in jiffies. */
+ unsigned long gp_req_activity; /* Time of last GP request */
+ /* in jiffies. */
unsigned long jiffies_stall; /* Time at which to check */
/* for CPU stalls. */
unsigned long jiffies_resched; /* Time at which to resched */
@@ -384,6 +367,10 @@
const char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */
struct list_head flavors; /* List of RCU flavors. */
+
+ spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
+ /* Synchronize offline with */
+ /* GP pre-initialization. */
};
/* Values for rcu_state structure's gp_flags field. */
@@ -394,16 +381,20 @@
#define RCU_GP_IDLE 0 /* Initial state and no GP in progress. */
#define RCU_GP_WAIT_GPS 1 /* Wait for grace-period start. */
#define RCU_GP_DONE_GPS 2 /* Wait done for grace-period start. */
-#define RCU_GP_WAIT_FQS 3 /* Wait for force-quiescent-state time. */
-#define RCU_GP_DOING_FQS 4 /* Wait done for force-quiescent-state time. */
-#define RCU_GP_CLEANUP 5 /* Grace-period cleanup started. */
-#define RCU_GP_CLEANED 6 /* Grace-period cleanup complete. */
+#define RCU_GP_ONOFF 3 /* Grace-period initialization hotplug. */
+#define RCU_GP_INIT 4 /* Grace-period initialization. */
+#define RCU_GP_WAIT_FQS 5 /* Wait for force-quiescent-state time. */
+#define RCU_GP_DOING_FQS 6 /* Wait done for force-quiescent-state time. */
+#define RCU_GP_CLEANUP 7 /* Grace-period cleanup started. */
+#define RCU_GP_CLEANED 8 /* Grace-period cleanup complete. */
#ifndef RCU_TREE_NONCORE
static const char * const gp_state_names[] = {
"RCU_GP_IDLE",
"RCU_GP_WAIT_GPS",
"RCU_GP_DONE_GPS",
+ "RCU_GP_ONOFF",
+ "RCU_GP_INIT",
"RCU_GP_WAIT_FQS",
"RCU_GP_DOING_FQS",
"RCU_GP_CLEANUP",
@@ -449,10 +440,13 @@
static void rcu_print_detail_task_stall(struct rcu_state *rsp);
static int rcu_print_task_stall(struct rcu_node *rnp);
static int rcu_print_task_exp_stall(struct rcu_node *rnp);
-static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
+static void rcu_preempt_check_blocked_tasks(struct rcu_state *rsp,
+ struct rcu_node *rnp);
static void rcu_preempt_check_callbacks(void);
void call_rcu(struct rcu_head *head, rcu_callback_t func);
static void __init __rcu_init_preempt(void);
+static void dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp,
+ int ncheck);
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
static void invoke_rcu_callbacks_kthread(void);
@@ -489,7 +483,6 @@
#ifdef CONFIG_RCU_NOCB_CPU
static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp);
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
-static void __maybe_unused rcu_kick_nohz_cpu(int cpu);
static bool init_nocb_callback_list(struct rcu_data *rdp);
static void rcu_bind_gp_kthread(void);
static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index d40708e..0b2c2ad 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -212,7 +212,7 @@
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
if (wake) {
smp_mb(); /* EGP done before wake_up(). */
- swake_up(&rsp->expedited_wq);
+ swake_up_one(&rsp->expedited_wq);
}
break;
}
@@ -472,6 +472,7 @@
static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
smp_call_func_t func)
{
+ int cpu;
struct rcu_node *rnp;
trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("reset"));
@@ -486,13 +487,20 @@
rnp->rew.rew_func = func;
rnp->rew.rew_rsp = rsp;
if (!READ_ONCE(rcu_par_gp_wq) ||
- rcu_scheduler_active != RCU_SCHEDULER_RUNNING) {
- /* No workqueues yet. */
+ rcu_scheduler_active != RCU_SCHEDULER_RUNNING ||
+ rcu_is_last_leaf_node(rsp, rnp)) {
+ /* No workqueues yet or last leaf, do direct call. */
sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work);
continue;
}
INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
- queue_work_on(rnp->grplo, rcu_par_gp_wq, &rnp->rew.rew_work);
+ preempt_disable();
+ cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
+ /* If all offline, queue the work on an unbound CPU. */
+ if (unlikely(cpu > rnp->grphi))
+ cpu = WORK_CPU_UNBOUND;
+ queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
+ preempt_enable();
rnp->exp_need_flush = true;
}
@@ -518,7 +526,7 @@
jiffies_start = jiffies;
for (;;) {
- ret = swait_event_timeout(
+ ret = swait_event_timeout_exclusive(
rsp->expedited_wq,
sync_rcu_preempt_exp_done_unlocked(rnp_root),
jiffies_stall);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7fd1203..a97c20e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -74,8 +74,8 @@
pr_info("\tRCU event tracing is enabled.\n");
if ((IS_ENABLED(CONFIG_64BIT) && RCU_FANOUT != 64) ||
(!IS_ENABLED(CONFIG_64BIT) && RCU_FANOUT != 32))
- pr_info("\tCONFIG_RCU_FANOUT set to non-default value of %d\n",
- RCU_FANOUT);
+ pr_info("\tCONFIG_RCU_FANOUT set to non-default value of %d.\n",
+ RCU_FANOUT);
if (rcu_fanout_exact)
pr_info("\tHierarchical RCU autobalancing is disabled.\n");
if (IS_ENABLED(CONFIG_RCU_FAST_NO_HZ))
@@ -88,11 +88,13 @@
pr_info("\tBuild-time adjustment of leaf fanout to %d.\n",
RCU_FANOUT_LEAF);
if (rcu_fanout_leaf != RCU_FANOUT_LEAF)
- pr_info("\tBoot-time adjustment of leaf fanout to %d.\n", rcu_fanout_leaf);
+ pr_info("\tBoot-time adjustment of leaf fanout to %d.\n",
+ rcu_fanout_leaf);
if (nr_cpu_ids != NR_CPUS)
pr_info("\tRCU restricting CPUs from NR_CPUS=%d to nr_cpu_ids=%u.\n", NR_CPUS, nr_cpu_ids);
#ifdef CONFIG_RCU_BOOST
- pr_info("\tRCU priority boosting: priority %d delay %d ms.\n", kthread_prio, CONFIG_RCU_BOOST_DELAY);
+ pr_info("\tRCU priority boosting: priority %d delay %d ms.\n",
+ kthread_prio, CONFIG_RCU_BOOST_DELAY);
#endif
if (blimit != DEFAULT_RCU_BLIMIT)
pr_info("\tBoot-time adjustment of callback invocation limit to %ld.\n", blimit);
@@ -127,6 +129,7 @@
static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
bool wake);
+static void rcu_read_unlock_special(struct task_struct *t);
/*
* Tell them what RCU they are running.
@@ -183,6 +186,9 @@
raw_lockdep_assert_held_rcu_node(rnp);
WARN_ON_ONCE(rdp->mynode != rnp);
WARN_ON_ONCE(!rcu_is_leaf_node(rnp));
+ /* RCU better not be waiting on newly onlined CPUs! */
+ WARN_ON_ONCE(rnp->qsmaskinitnext & ~rnp->qsmaskinit & rnp->qsmask &
+ rdp->grpmask);
/*
* Decide where to queue the newly blocked task. In theory,
@@ -260,8 +266,10 @@
* ->exp_tasks pointers, respectively, to reference the newly
* blocked tasks.
*/
- if (!rnp->gp_tasks && (blkd_state & RCU_GP_BLKD))
+ if (!rnp->gp_tasks && (blkd_state & RCU_GP_BLKD)) {
rnp->gp_tasks = &t->rcu_node_entry;
+ WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq);
+ }
if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
rnp->exp_tasks = &t->rcu_node_entry;
WARN_ON_ONCE(!(blkd_state & RCU_GP_BLKD) !=
@@ -286,20 +294,24 @@
}
/*
- * Record a preemptible-RCU quiescent state for the specified CPU. Note
- * that this just means that the task currently running on the CPU is
- * not in a quiescent state. There might be any number of tasks blocked
- * while in an RCU read-side critical section.
+ * Record a preemptible-RCU quiescent state for the specified CPU.
+ * Note that this does not necessarily mean that the task currently running
+ * on the CPU is in a quiescent state: Instead, it means that the current
+ * grace period need not wait on any RCU read-side critical section that
+ * starts later on this CPU. It also means that if the current task is
+ * in an RCU read-side critical section, it has already added itself to
+ * some leaf rcu_node structure's ->blkd_tasks list. In addition to the
+ * current task, there might be any number of other tasks blocked while
+ * in an RCU read-side critical section.
*
- * As with the other rcu_*_qs() functions, callers to this function
- * must disable preemption.
+ * Callers to this function must disable preemption.
*/
static void rcu_preempt_qs(void)
{
RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_qs() invoked with preemption enabled!!!\n");
if (__this_cpu_read(rcu_data_p->cpu_no_qs.s)) {
trace_rcu_grace_period(TPS("rcu_preempt"),
- __this_cpu_read(rcu_data_p->gpnum),
+ __this_cpu_read(rcu_data_p->gp_seq),
TPS("cpuqs"));
__this_cpu_write(rcu_data_p->cpu_no_qs.b.norm, false);
barrier(); /* Coordinate with rcu_preempt_check_callbacks(). */
@@ -348,8 +360,8 @@
trace_rcu_preempt_task(rdp->rsp->name,
t->pid,
(rnp->qsmask & rdp->grpmask)
- ? rnp->gpnum
- : rnp->gpnum + 1);
+ ? rnp->gp_seq
+ : rcu_seq_snap(&rnp->gp_seq));
rcu_preempt_ctxt_queue(rnp, rdp);
} else if (t->rcu_read_lock_nesting < 0 &&
t->rcu_read_unlock_special.s) {
@@ -456,7 +468,7 @@
* notify RCU core processing or task having blocked during the RCU
* read-side critical section.
*/
-void rcu_read_unlock_special(struct task_struct *t)
+static void rcu_read_unlock_special(struct task_struct *t)
{
bool empty_exp;
bool empty_norm;
@@ -535,13 +547,15 @@
WARN_ON_ONCE(rnp != t->rcu_blocked_node);
WARN_ON_ONCE(!rcu_is_leaf_node(rnp));
empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
+ WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
+ (!empty_norm || rnp->qsmask));
empty_exp = sync_rcu_preempt_exp_done(rnp);
smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
np = rcu_next_node_entry(t, rnp);
list_del_init(&t->rcu_node_entry);
t->rcu_blocked_node = NULL;
trace_rcu_unlock_preempted_task(TPS("rcu_preempt"),
- rnp->gpnum, t->pid);
+ rnp->gp_seq, t->pid);
if (&t->rcu_node_entry == rnp->gp_tasks)
rnp->gp_tasks = np;
if (&t->rcu_node_entry == rnp->exp_tasks)
@@ -562,7 +576,7 @@
empty_exp_now = sync_rcu_preempt_exp_done(rnp);
if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) {
trace_rcu_quiescent_state_report(TPS("preempt_rcu"),
- rnp->gpnum,
+ rnp->gp_seq,
0, rnp->qsmask,
rnp->level,
rnp->grplo,
@@ -686,24 +700,27 @@
* Check that the list of blocked tasks for the newly completed grace
* period is in fact empty. It is a serious bug to complete a grace
* period that still has RCU readers blocked! This function must be
- * invoked -before- updating this rnp's ->gpnum, and the rnp's ->lock
+ * invoked -before- updating this rnp's ->gp_seq, and the rnp's ->lock
* must be held by the caller.
*
* Also, if there are blocked tasks on the list, they automatically
* block the newly created grace period, so set up ->gp_tasks accordingly.
*/
-static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+static void
+rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp)
{
struct task_struct *t;
RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
- WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
- if (rcu_preempt_has_tasks(rnp)) {
+ if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)))
+ dump_blkd_tasks(rsp, rnp, 10);
+ if (rcu_preempt_has_tasks(rnp) &&
+ (rnp->qsmaskinit || rnp->wait_blkd_tasks)) {
rnp->gp_tasks = rnp->blkd_tasks.next;
t = container_of(rnp->gp_tasks, struct task_struct,
rcu_node_entry);
trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"),
- rnp->gpnum, t->pid);
+ rnp->gp_seq, t->pid);
}
WARN_ON_ONCE(rnp->qsmask);
}
@@ -717,6 +734,7 @@
*/
static void rcu_preempt_check_callbacks(void)
{
+ struct rcu_state *rsp = &rcu_preempt_state;
struct task_struct *t = current;
if (t->rcu_read_lock_nesting == 0) {
@@ -725,7 +743,9 @@
}
if (t->rcu_read_lock_nesting > 0 &&
__this_cpu_read(rcu_data_p->core_needs_qs) &&
- __this_cpu_read(rcu_data_p->cpu_no_qs.b.norm))
+ __this_cpu_read(rcu_data_p->cpu_no_qs.b.norm) &&
+ !t->rcu_read_unlock_special.b.need_qs &&
+ time_after(jiffies, rsp->gp_start + HZ))
t->rcu_read_unlock_special.b.need_qs = true;
}
@@ -841,6 +861,47 @@
__rcu_read_unlock();
}
+/*
+ * Dump the blocked-tasks state, but limit the list dump to the
+ * specified number of elements.
+ */
+static void
+dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck)
+{
+ int cpu;
+ int i;
+ struct list_head *lhp;
+ bool onl;
+ struct rcu_data *rdp;
+ struct rcu_node *rnp1;
+
+ raw_lockdep_assert_held_rcu_node(rnp);
+ pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
+ __func__, rnp->grplo, rnp->grphi, rnp->level,
+ (long)rnp->gp_seq, (long)rnp->completedqs);
+ for (rnp1 = rnp; rnp1; rnp1 = rnp1->parent)
+ pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n",
+ __func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext);
+ pr_info("%s: ->gp_tasks %p ->boost_tasks %p ->exp_tasks %p\n",
+ __func__, rnp->gp_tasks, rnp->boost_tasks, rnp->exp_tasks);
+ pr_info("%s: ->blkd_tasks", __func__);
+ i = 0;
+ list_for_each(lhp, &rnp->blkd_tasks) {
+ pr_cont(" %p", lhp);
+ if (++i >= 10)
+ break;
+ }
+ pr_cont("\n");
+ for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
+ rdp = per_cpu_ptr(rsp->rda, cpu);
+ onl = !!(rdp->grpmask & rcu_rnp_online_cpus(rnp));
+ pr_info("\t%d: %c online: %ld(%d) offline: %ld(%d)\n",
+ cpu, ".o"[onl],
+ (long)rdp->rcu_onl_gp_seq, rdp->rcu_onl_gp_flags,
+ (long)rdp->rcu_ofl_gp_seq, rdp->rcu_ofl_gp_flags);
+ }
+}
+
#else /* #ifdef CONFIG_PREEMPT_RCU */
static struct rcu_state *const rcu_state_p = &rcu_sched_state;
@@ -911,7 +972,8 @@
* so there is no need to check for blocked tasks. So check only for
* bogus qsmask values.
*/
-static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+static void
+rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp)
{
WARN_ON_ONCE(rnp->qsmask);
}
@@ -949,6 +1011,15 @@
{
}
+/*
+ * Dump the guaranteed-empty blocked-tasks state. Trust but verify.
+ */
+static void
+dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck)
+{
+ WARN_ON_ONCE(!list_empty(&rnp->blkd_tasks));
+}
+
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
#ifdef CONFIG_RCU_BOOST
@@ -1433,7 +1504,8 @@
* completed since we last checked and there are
* callbacks not yet ready to invoke.
*/
- if ((rdp->completed != rnp->completed ||
+ if ((rcu_seq_completed_gp(rdp->gp_seq,
+ rcu_seq_current(&rnp->gp_seq)) ||
unlikely(READ_ONCE(rdp->gpwrap))) &&
rcu_segcblist_pend_cbs(&rdp->cblist))
note_gp_changes(rsp, rdp);
@@ -1720,16 +1792,16 @@
*/
touch_nmi_watchdog();
- if (rsp->gpnum == rdp->gpnum) {
+ ticks_value = rcu_seq_ctr(rsp->gp_seq - rdp->gp_seq);
+ if (ticks_value) {
+ ticks_title = "GPs behind";
+ } else {
ticks_title = "ticks this GP";
ticks_value = rdp->ticks_this_gp;
- } else {
- ticks_title = "GPs behind";
- ticks_value = rsp->gpnum - rdp->gpnum;
}
print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
- delta = rdp->mynode->gpnum - rdp->rcu_iw_gpnum;
- pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%ld softirq=%u/%u fqs=%ld %s\n",
+ delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
+ pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
cpu,
"O."[!!cpu_online(cpu)],
"o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
@@ -1817,7 +1889,7 @@
static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
{
- return &rnp->nocb_gp_wq[rnp->completed & 0x1];
+ return &rnp->nocb_gp_wq[rcu_seq_ctr(rnp->gp_seq) & 0x1];
}
static void rcu_init_one_nocb(struct rcu_node *rnp)
@@ -1854,8 +1926,8 @@
WRITE_ONCE(rdp_leader->nocb_leader_sleep, false);
del_timer(&rdp->nocb_timer);
raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
- smp_mb(); /* ->nocb_leader_sleep before swake_up(). */
- swake_up(&rdp_leader->nocb_wq);
+ smp_mb(); /* ->nocb_leader_sleep before swake_up_one(). */
+ swake_up_one(&rdp_leader->nocb_wq);
} else {
raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
}
@@ -2069,12 +2141,17 @@
bool needwake;
struct rcu_node *rnp = rdp->mynode;
- raw_spin_lock_irqsave_rcu_node(rnp, flags);
- c = rcu_cbs_completed(rdp->rsp, rnp);
- needwake = rcu_start_this_gp(rnp, rdp, c);
- raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
- if (needwake)
- rcu_gp_kthread_wake(rdp->rsp);
+ local_irq_save(flags);
+ c = rcu_seq_snap(&rdp->rsp->gp_seq);
+ if (!rdp->gpwrap && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
+ local_irq_restore(flags);
+ } else {
+ raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
+ needwake = rcu_start_this_gp(rnp, rdp, c);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ if (needwake)
+ rcu_gp_kthread_wake(rdp->rsp);
+ }
/*
* Wait for the grace period. Do so interruptibly to avoid messing
@@ -2082,9 +2159,9 @@
*/
trace_rcu_this_gp(rnp, rdp, c, TPS("StartWait"));
for (;;) {
- swait_event_interruptible(
- rnp->nocb_gp_wq[c & 0x1],
- (d = ULONG_CMP_GE(READ_ONCE(rnp->completed), c)));
+ swait_event_interruptible_exclusive(
+ rnp->nocb_gp_wq[rcu_seq_ctr(c) & 0x1],
+ (d = rcu_seq_done(&rnp->gp_seq, c)));
if (likely(d))
break;
WARN_ON(signal_pending(current));
@@ -2111,7 +2188,7 @@
/* Wait for callbacks to appear. */
if (!rcu_nocb_poll) {
trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, TPS("Sleep"));
- swait_event_interruptible(my_rdp->nocb_wq,
+ swait_event_interruptible_exclusive(my_rdp->nocb_wq,
!READ_ONCE(my_rdp->nocb_leader_sleep));
raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
my_rdp->nocb_leader_sleep = true;
@@ -2176,7 +2253,7 @@
raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
if (rdp != my_rdp && tail == &rdp->nocb_follower_head) {
/* List was empty, so wake up the follower. */
- swake_up(&rdp->nocb_wq);
+ swake_up_one(&rdp->nocb_wq);
}
}
@@ -2193,7 +2270,7 @@
{
for (;;) {
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("FollowerSleep"));
- swait_event_interruptible(rdp->nocb_wq,
+ swait_event_interruptible_exclusive(rdp->nocb_wq,
READ_ONCE(rdp->nocb_follower_head));
if (smp_load_acquire(&rdp->nocb_follower_head)) {
/* ^^^ Ensure CB invocation follows _head test. */
@@ -2569,23 +2646,6 @@
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
/*
- * An adaptive-ticks CPU can potentially execute in kernel mode for an
- * arbitrarily long period of time with the scheduling-clock tick turned
- * off. RCU will be paying attention to this CPU because it is in the
- * kernel, but the CPU cannot be guaranteed to be executing the RCU state
- * machine because the scheduling-clock tick has been disabled. Therefore,
- * if an adaptive-ticks CPU is failing to respond to the current grace
- * period and has not be idle from an RCU perspective, kick it.
- */
-static void __maybe_unused rcu_kick_nohz_cpu(int cpu)
-{
-#ifdef CONFIG_NO_HZ_FULL
- if (tick_nohz_full_cpu(cpu))
- smp_send_reschedule(cpu);
-#endif /* #ifdef CONFIG_NO_HZ_FULL */
-}
-
-/*
* Is this CPU a NO_HZ_FULL CPU that should ignore RCU so that the
* grace-period kthread will do force_quiescent_state() processing?
* The idea is to avoid waking up RCU core processing on such a
@@ -2610,8 +2670,6 @@
*/
static void rcu_bind_gp_kthread(void)
{
- int __maybe_unused cpu;
-
if (!tick_nohz_full_enabled())
return;
housekeeping_affine(current, HK_FLAG_RCU);
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c230a6..39cb23d 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -507,14 +507,15 @@
#ifdef CONFIG_TASKS_RCU
/*
- * Simple variant of RCU whose quiescent states are voluntary context switch,
- * user-space execution, and idle. As such, grace periods can take one good
- * long time. There are no read-side primitives similar to rcu_read_lock()
- * and rcu_read_unlock() because this implementation is intended to get
- * the system into a safe state for some of the manipulations involved in
- * tracing and the like. Finally, this implementation does not support
- * high call_rcu_tasks() rates from multiple CPUs. If this is required,
- * per-CPU callback lists will be needed.
+ * Simple variant of RCU whose quiescent states are voluntary context
+ * switch, cond_resched_rcu_qs(), user-space execution, and idle.
+ * As such, grace periods can take one good long time. There are no
+ * read-side primitives similar to rcu_read_lock() and rcu_read_unlock()
+ * because this implementation is intended to get the system into a safe
+ * state for some of the manipulations involved in tracing and the like.
+ * Finally, this implementation does not support high call_rcu_tasks()
+ * rates from multiple CPUs. If this is required, per-CPU callback lists
+ * will be needed.
*/
/* Global list of callbacks and associated lock. */
@@ -542,11 +543,11 @@
* period elapses, in other words after all currently executing RCU
* read-side critical sections have completed. call_rcu_tasks() assumes
* that the read-side critical sections end at a voluntary context
- * switch (not a preemption!), entry into idle, or transition to usermode
- * execution. As such, there are no read-side primitives analogous to
- * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
- * to determine that all tasks have passed through a safe state, not so
- * much for data-strcuture synchronization.
+ * switch (not a preemption!), cond_resched_rcu_qs(), entry into idle,
+ * or transition to usermode execution. As such, there are no read-side
+ * primitives analogous to rcu_read_lock() and rcu_read_unlock() because
+ * this primitive is intended to determine that all tasks have passed
+ * through a safe state, not so much for data-strcuture synchronization.
*
* See the description of call_rcu() for more detailed information on
* memory ordering guarantees.
@@ -667,6 +668,7 @@
struct rcu_head *list;
struct rcu_head *next;
LIST_HEAD(rcu_tasks_holdouts);
+ int fract;
/* Run on housekeeping CPUs by default. Sysadm can move if desired. */
housekeeping_affine(current, HK_FLAG_RCU);
@@ -748,13 +750,25 @@
* holdouts. When the list is empty, we are done.
*/
lastreport = jiffies;
- while (!list_empty(&rcu_tasks_holdouts)) {
+
+ /* Start off with HZ/10 wait and slowly back off to 1 HZ wait*/
+ fract = 10;
+
+ for (;;) {
bool firstreport;
bool needreport;
int rtst;
struct task_struct *t1;
- schedule_timeout_interruptible(HZ);
+ if (list_empty(&rcu_tasks_holdouts))
+ break;
+
+ /* Slowly back off waiting for holdouts */
+ schedule_timeout_interruptible(HZ/fract);
+
+ if (fract > 1)
+ fract--;
+
rtst = READ_ONCE(rcu_task_stall_timeout);
needreport = rtst > 0 &&
time_after(jiffies, lastreport + rtst);
@@ -800,6 +814,7 @@
list = next;
cond_resched();
}
+ /* Paranoid sleep to keep this from entering a tight loop */
schedule_timeout_uninterruptible(HZ/10);
}
}
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index d9a02b3..7fe1834 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -20,7 +20,7 @@
obj-y += idle.o fair.o rt.o deadline.o
obj-y += wait.o wait_bit.o swait.o completion.o
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o
+obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
obj-$(CONFIG_SCHEDSTATS) += stats.o
obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 10c83e7..e3e3b97 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -53,6 +53,7 @@
*
*/
#include "sched.h"
+#include <linux/sched_clock.h>
/*
* Scheduler clock - returns current time in nanosec units.
@@ -66,12 +67,7 @@
}
EXPORT_SYMBOL_GPL(sched_clock);
-__read_mostly int sched_clock_running;
-
-void sched_clock_init(void)
-{
- sched_clock_running = 1;
-}
+static DEFINE_STATIC_KEY_FALSE(sched_clock_running);
#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
/*
@@ -195,17 +191,40 @@
smp_mb(); /* matches sched_clock_init_late() */
- if (sched_clock_running == 2)
+ if (static_key_count(&sched_clock_running.key) == 2)
__clear_sched_clock_stable();
}
+static void __sched_clock_gtod_offset(void)
+{
+ struct sched_clock_data *scd = this_scd();
+
+ __scd_stamp(scd);
+ __gtod_offset = (scd->tick_raw + __sched_clock_offset) - scd->tick_gtod;
+}
+
+void __init sched_clock_init(void)
+{
+ /*
+ * Set __gtod_offset such that once we mark sched_clock_running,
+ * sched_clock_tick() continues where sched_clock() left off.
+ *
+ * Even if TSC is buggered, we're still UP at this point so it
+ * can't really be out of sync.
+ */
+ local_irq_disable();
+ __sched_clock_gtod_offset();
+ local_irq_enable();
+
+ static_branch_inc(&sched_clock_running);
+}
/*
* We run this as late_initcall() such that it runs after all built-in drivers,
* notably: acpi_processor and intel_idle, which can mark the TSC as unstable.
*/
static int __init sched_clock_init_late(void)
{
- sched_clock_running = 2;
+ static_branch_inc(&sched_clock_running);
/*
* Ensure that it is impossible to not do a static_key update.
*
@@ -350,8 +369,8 @@
if (sched_clock_stable())
return sched_clock() + __sched_clock_offset;
- if (unlikely(!sched_clock_running))
- return 0ull;
+ if (!static_branch_unlikely(&sched_clock_running))
+ return sched_clock();
preempt_disable_notrace();
scd = cpu_sdc(cpu);
@@ -373,7 +392,7 @@
if (sched_clock_stable())
return;
- if (unlikely(!sched_clock_running))
+ if (!static_branch_unlikely(&sched_clock_running))
return;
lockdep_assert_irqs_disabled();
@@ -385,8 +404,6 @@
void sched_clock_tick_stable(void)
{
- u64 gtod, clock;
-
if (!sched_clock_stable())
return;
@@ -398,9 +415,7 @@
* TSC to be unstable, any computation will be computing crap.
*/
local_irq_disable();
- gtod = ktime_get_ns();
- clock = sched_clock();
- __gtod_offset = (clock + __sched_clock_offset) - gtod;
+ __sched_clock_gtod_offset();
local_irq_enable();
}
@@ -434,9 +449,17 @@
#else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
+void __init sched_clock_init(void)
+{
+ static_branch_inc(&sched_clock_running);
+ local_irq_disable();
+ generic_sched_clock_init();
+ local_irq_enable();
+}
+
u64 sched_clock_cpu(int cpu)
{
- if (unlikely(!sched_clock_running))
+ if (!static_branch_unlikely(&sched_clock_running))
return 0;
return sched_clock();
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index e426b0c..a1ad5b7 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -22,8 +22,8 @@
*
* See also complete_all(), wait_for_completion() and related routines.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*/
void complete(struct completion *x)
{
@@ -44,8 +44,8 @@
*
* This will wake up all threads waiting on this particular completion event.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*
* Since complete_all() sets the completion of @x permanently to done
* to allow multiple waiters to finish, a call to reinit_completion()
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe365c9..c45de46 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -17,6 +17,8 @@
#include "../workqueue_internal.h"
#include "../smpboot.h"
+#include "pelt.h"
+
#define CREATE_TRACE_POINTS
#include <trace/events/sched.h>
@@ -45,14 +47,6 @@
const_debug unsigned int sysctl_sched_nr_migrate = 32;
/*
- * period over which we average the RT time consumption, measured
- * in ms.
- *
- * default: 1s
- */
-const_debug unsigned int sysctl_sched_time_avg = MSEC_PER_SEC;
-
-/*
* period over which we measure -rt task CPU usage in us.
* default: 1s
*/
@@ -183,9 +177,9 @@
rq->clock_task += delta;
-#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+#ifdef HAVE_SCHED_AVG_IRQ
if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
- sched_rt_avg_update(rq, irq_delta + steal);
+ update_irq_load_avg(rq, irq_delta + steal);
#endif
}
@@ -412,8 +406,8 @@
* its already queued (either by us or someone else) and will get the
* wakeup due to that.
*
- * This cmpxchg() implies a full barrier, which pairs with the write
- * barrier implied by the wakeup in wake_up_q().
+ * This cmpxchg() executes a full barrier, which pairs with the full
+ * barrier executed by the wakeup in wake_up_q().
*/
if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL))
return;
@@ -441,8 +435,8 @@
task->wake_q.next = NULL;
/*
- * wake_up_process() implies a wmb() to pair with the queueing
- * in wake_q_add() so as not to miss wakeups.
+ * wake_up_process() executes a full barrier, which pairs with
+ * the queueing in wake_q_add() so as not to miss wakeups.
*/
wake_up_process(task);
put_task_struct(task);
@@ -649,23 +643,6 @@
return true;
}
#endif /* CONFIG_NO_HZ_FULL */
-
-void sched_avg_update(struct rq *rq)
-{
- s64 period = sched_avg_period();
-
- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
- /*
- * Inline assembly required to prevent the compiler
- * optimising this loop into a divmod call.
- * See __iter_div_u64_rem() for another example of this.
- */
- asm("" : "+rm" (rq->age_stamp));
- rq->age_stamp += period;
- rq->rt_avg /= 2;
- }
-}
-
#endif /* CONFIG_SMP */
#if defined(CONFIG_RT_GROUP_SCHED) || (defined(CONFIG_FAIR_GROUP_SCHED) && \
@@ -1199,6 +1176,7 @@
__set_task_cpu(p, new_cpu);
}
+#ifdef CONFIG_NUMA_BALANCING
static void __migrate_swap_task(struct task_struct *p, int cpu)
{
if (task_on_rq_queued(p)) {
@@ -1280,16 +1258,17 @@
/*
* Cross migrate two tasks
*/
-int migrate_swap(struct task_struct *cur, struct task_struct *p)
+int migrate_swap(struct task_struct *cur, struct task_struct *p,
+ int target_cpu, int curr_cpu)
{
struct migration_swap_arg arg;
int ret = -EINVAL;
arg = (struct migration_swap_arg){
.src_task = cur,
- .src_cpu = task_cpu(cur),
+ .src_cpu = curr_cpu,
.dst_task = p,
- .dst_cpu = task_cpu(p),
+ .dst_cpu = target_cpu,
};
if (arg.src_cpu == arg.dst_cpu)
@@ -1314,6 +1293,7 @@
out:
return ret;
}
+#endif /* CONFIG_NUMA_BALANCING */
/*
* wait_task_inactive - wait for a thread to unschedule.
@@ -1879,8 +1859,7 @@
* rq(c1)->lock (if not at the same time, then in that order).
* C) LOCK of the rq(c1)->lock scheduling in task
*
- * Transitivity guarantees that B happens after A and C after B.
- * Note: we only require RCpc transitivity.
+ * Release/acquire chaining guarantees that B happens after A and C after B.
* Note: the CPU doing B need not be c0 or c1
*
* Example:
@@ -1942,16 +1921,9 @@
* UNLOCK rq(0)->lock
*
*
- * However; for wakeups there is a second guarantee we must provide, namely we
- * must observe the state that lead to our wakeup. That is, not only must our
- * task observe its own prior state, it must also observe the stores prior to
- * its wakeup.
- *
- * This means that any means of doing remote wakeups must order the CPU doing
- * the wakeup against the CPU the task is going to end up running on. This,
- * however, is already required for the regular Program-Order guarantee above,
- * since the waking CPU is the one issueing the ACQUIRE (smp_cond_load_acquire).
- *
+ * However, for wakeups there is a second guarantee we must provide, namely we
+ * must ensure that CONDITION=1 done by the caller can not be reordered with
+ * accesses to the task state; see try_to_wake_up() and set_current_state().
*/
/**
@@ -1967,6 +1939,9 @@
* Atomic against schedule() which would dequeue a task, also see
* set_current_state().
*
+ * This function executes a full memory barrier before accessing the task
+ * state; see set_current_state().
+ *
* Return: %true if @p->state changes (an actual wakeup was done),
* %false otherwise.
*/
@@ -1998,21 +1973,20 @@
* be possible to, falsely, observe p->on_rq == 0 and get stuck
* in smp_cond_load_acquire() below.
*
- * sched_ttwu_pending() try_to_wake_up()
- * [S] p->on_rq = 1; [L] P->state
- * UNLOCK rq->lock -----.
- * \
- * +--- RMB
- * schedule() /
- * LOCK rq->lock -----'
- * UNLOCK rq->lock
+ * sched_ttwu_pending() try_to_wake_up()
+ * STORE p->on_rq = 1 LOAD p->state
+ * UNLOCK rq->lock
+ *
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock smp_rmb();
+ * smp_mb__after_spinlock();
+ * UNLOCK rq->lock
*
* [task p]
- * [S] p->state = UNINTERRUPTIBLE [L] p->on_rq
+ * STORE p->state = UNINTERRUPTIBLE LOAD p->on_rq
*
- * Pairs with the UNLOCK+LOCK on rq->lock from the
- * last wakeup of our task and the schedule that got our task
- * current.
+ * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+ * __schedule(). See the comment for smp_mb__after_spinlock().
*/
smp_rmb();
if (p->on_rq && ttwu_remote(p, wake_flags))
@@ -2026,15 +2000,17 @@
* One must be running (->on_cpu == 1) in order to remove oneself
* from the runqueue.
*
- * [S] ->on_cpu = 1; [L] ->on_rq
- * UNLOCK rq->lock
- * RMB
- * LOCK rq->lock
- * [S] ->on_rq = 0; [L] ->on_cpu
+ * __schedule() (switch to task 'p') try_to_wake_up()
+ * STORE p->on_cpu = 1 LOAD p->on_rq
+ * UNLOCK rq->lock
*
- * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock
- * from the consecutive calls to schedule(); the first switching to our
- * task, the second putting it to sleep.
+ * __schedule() (put 'p' to sleep)
+ * LOCK rq->lock smp_rmb();
+ * smp_mb__after_spinlock();
+ * STORE p->on_rq = 0 LOAD p->on_cpu
+ *
+ * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+ * __schedule(). See the comment for smp_mb__after_spinlock().
*/
smp_rmb();
@@ -2140,8 +2116,7 @@
*
* Return: 1 if the process was woken up, 0 if it was already running.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * This function executes a full memory barrier before accessing the task state.
*/
int wake_up_process(struct task_struct *p)
{
@@ -2317,7 +2292,6 @@
int sched_fork(unsigned long clone_flags, struct task_struct *p)
{
unsigned long flags;
- int cpu = get_cpu();
__sched_fork(clone_flags, p);
/*
@@ -2353,14 +2327,12 @@
p->sched_reset_on_fork = 0;
}
- if (dl_prio(p->prio)) {
- put_cpu();
+ if (dl_prio(p->prio))
return -EAGAIN;
- } else if (rt_prio(p->prio)) {
+ else if (rt_prio(p->prio))
p->sched_class = &rt_sched_class;
- } else {
+ else
p->sched_class = &fair_sched_class;
- }
init_entity_runnable_average(&p->se);
@@ -2376,7 +2348,7 @@
* We're setting the CPU for the first time, we don't migrate,
* so use __set_task_cpu().
*/
- __set_task_cpu(p, cpu);
+ __set_task_cpu(p, smp_processor_id());
if (p->sched_class->task_fork)
p->sched_class->task_fork(p);
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
@@ -2393,8 +2365,6 @@
plist_node_init(&p->pushable_tasks, MAX_PRIO);
RB_CLEAR_NODE(&p->pushable_dl_tasks);
#endif
-
- put_cpu();
return 0;
}
@@ -5714,13 +5684,6 @@
}
}
-static void set_cpu_rq_start_time(unsigned int cpu)
-{
- struct rq *rq = cpu_rq(cpu);
-
- rq->age_stamp = sched_clock_cpu(cpu);
-}
-
/*
* used to mark begin/end of suspend/resume:
*/
@@ -5838,7 +5801,6 @@
int sched_cpu_starting(unsigned int cpu)
{
- set_cpu_rq_start_time(cpu);
sched_rq_cpu_starting(cpu);
sched_tick_start(cpu);
return 0;
@@ -5954,7 +5916,6 @@
int i, j;
unsigned long alloc_size = 0, ptr;
- sched_clock_init();
wait_bit_init();
#ifdef CONFIG_FAIR_GROUP_SCHED
@@ -6106,7 +6067,6 @@
#ifdef CONFIG_SMP
idle_thread_set_boot_cpu();
- set_cpu_rq_start_time(smp_processor_id());
#endif
init_sched_fair_class();
@@ -6785,6 +6745,16 @@
seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled);
seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
+ if (schedstat_enabled() && tg != &root_task_group) {
+ u64 ws = 0;
+ int i;
+
+ for_each_possible_cpu(i)
+ ws += schedstat_val(tg->se[i]->statistics.wait_sum);
+
+ seq_printf(sf, "wait_sum %llu\n", ws);
+ }
+
return 0;
}
#endif /* CONFIG_CFS_BANDWIDTH */
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index c907fde..3fffad3 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -53,9 +53,7 @@
unsigned int iowait_boost_max;
u64 last_update;
- /* The fields below are only needed when sharing a policy: */
- unsigned long util_cfs;
- unsigned long util_dl;
+ unsigned long bw_dl;
unsigned long max;
/* The field below is for single-CPU policies only: */
@@ -179,33 +177,90 @@
return cpufreq_driver_resolve_freq(policy, freq);
}
-static void sugov_get_util(struct sugov_cpu *sg_cpu)
+/*
+ * This function computes an effective utilization for the given CPU, to be
+ * used for frequency selection given the linear relation: f = u * f_max.
+ *
+ * The scheduler tracks the following metrics:
+ *
+ * cpu_util_{cfs,rt,dl,irq}()
+ * cpu_bw_dl()
+ *
+ * Where the cfs,rt and dl util numbers are tracked with the same metric and
+ * synchronized windows and are thus directly comparable.
+ *
+ * The cfs,rt,dl utilization are the running times measured with rq->clock_task
+ * which excludes things like IRQ and steal-time. These latter are then accrued
+ * in the irq utilization.
+ *
+ * The DL bandwidth number otoh is not a measured metric but a value computed
+ * based on the task model parameters and gives the minimal utilization
+ * required to meet deadlines.
+ */
+static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
{
struct rq *rq = cpu_rq(sg_cpu->cpu);
+ unsigned long util, irq, max;
- sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
- sg_cpu->util_cfs = cpu_util_cfs(rq);
- sg_cpu->util_dl = cpu_util_dl(rq);
-}
-
-static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
-{
- struct rq *rq = cpu_rq(sg_cpu->cpu);
+ sg_cpu->max = max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
+ sg_cpu->bw_dl = cpu_bw_dl(rq);
if (rt_rq_is_runnable(&rq->rt))
- return sg_cpu->max;
+ return max;
/*
- * Utilization required by DEADLINE must always be granted while, for
- * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
- * gracefully reduce the frequency when no tasks show up for longer
+ * Early check to see if IRQ/steal time saturates the CPU, can be
+ * because of inaccuracies in how we track these -- see
+ * update_irq_load_avg().
+ */
+ irq = cpu_util_irq(rq);
+ if (unlikely(irq >= max))
+ return max;
+
+ /*
+ * Because the time spend on RT/DL tasks is visible as 'lost' time to
+ * CFS tasks and we use the same metric to track the effective
+ * utilization (PELT windows are synchronized) we can directly add them
+ * to obtain the CPU's actual utilization.
+ */
+ util = cpu_util_cfs(rq);
+ util += cpu_util_rt(rq);
+
+ /*
+ * We do not make cpu_util_dl() a permanent part of this sum because we
+ * want to use cpu_bw_dl() later on, but we need to check if the
+ * CFS+RT+DL sum is saturated (ie. no idle time) such that we select
+ * f_max when there is no idle time.
+ *
+ * NOTE: numerical errors or stop class might cause us to not quite hit
+ * saturation when we should -- something for later.
+ */
+ if ((util + cpu_util_dl(rq)) >= max)
+ return max;
+
+ /*
+ * There is still idle time; further improve the number by using the
+ * irq metric. Because IRQ/steal time is hidden from the task clock we
+ * need to scale the task numbers:
+ *
+ * 1 - irq
+ * U' = irq + ------- * U
+ * max
+ */
+ util = scale_irq_capacity(util, irq, max);
+ util += irq;
+
+ /*
+ * Bandwidth required by DEADLINE must always be granted while, for
+ * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
+ * to gracefully reduce the frequency when no tasks show up for longer
* periods of time.
*
- * Ideally we would like to set util_dl as min/guaranteed freq and
- * util_cfs + util_dl as requested freq. However, cpufreq is not yet
- * ready for such an interface. So, we only do the latter for now.
+ * Ideally we would like to set bw_dl as min/guaranteed freq and util +
+ * bw_dl as requested freq. However, cpufreq is not yet ready for such
+ * an interface. So, we only do the latter for now.
*/
- return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
+ return min(max, util + sg_cpu->bw_dl);
}
/**
@@ -360,7 +415,7 @@
*/
static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu, struct sugov_policy *sg_policy)
{
- if (cpu_util_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->util_dl)
+ if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
sg_policy->need_freq_update = true;
}
@@ -383,9 +438,8 @@
busy = sugov_cpu_is_busy(sg_cpu);
- sugov_get_util(sg_cpu);
+ util = sugov_get_util(sg_cpu);
max = sg_cpu->max;
- util = sugov_aggregate_util(sg_cpu);
sugov_iowait_apply(sg_cpu, time, &util, &max);
next_f = get_next_freq(sg_policy, util, max);
/*
@@ -424,9 +478,8 @@
struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
unsigned long j_util, j_max;
- sugov_get_util(j_sg_cpu);
+ j_util = sugov_get_util(j_sg_cpu);
j_max = j_sg_cpu->max;
- j_util = sugov_aggregate_util(j_sg_cpu);
sugov_iowait_apply(j_sg_cpu, time, &j_util, &j_max);
if (j_util * max > j_max * util) {
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b5fbdde..997ea7b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -16,6 +16,7 @@
* Fabio Checconi <fchecconi@gmail.com>
*/
#include "sched.h"
+#include "pelt.h"
struct dl_bandwidth def_dl_bandwidth;
@@ -1179,8 +1180,6 @@
curr->se.exec_start = now;
cgroup_account_cputime(curr, delta_exec);
- sched_rt_avg_update(rq, delta_exec);
-
if (dl_entity_is_special(dl_se))
return;
@@ -1761,6 +1760,9 @@
deadline_queue_push_tasks(rq);
+ if (rq->curr->sched_class != &dl_sched_class)
+ update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
+
return p;
}
@@ -1768,6 +1770,7 @@
{
update_curr_dl(rq);
+ update_dl_rq_load_avg(rq_clock_task(rq), rq, 1);
if (on_dl_rq(&p->dl) && p->nr_cpus_allowed > 1)
enqueue_pushable_dl_task(rq, p);
}
@@ -1784,6 +1787,7 @@
{
update_curr_dl(rq);
+ update_dl_rq_load_avg(rq_clock_task(rq), rq, 1);
/*
* Even when we have runtime, update_curr_dl() might have resulted in us
* not being the leftmost task anymore. In that case NEED_RESCHED will
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index e593b41..60caf1f 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -111,20 +111,19 @@
cmp += 3;
}
- for (i = 0; i < __SCHED_FEAT_NR; i++) {
- if (strcmp(cmp, sched_feat_names[i]) == 0) {
- if (neg) {
- sysctl_sched_features &= ~(1UL << i);
- sched_feat_disable(i);
- } else {
- sysctl_sched_features |= (1UL << i);
- sched_feat_enable(i);
- }
- break;
- }
+ i = match_string(sched_feat_names, __SCHED_FEAT_NR, cmp);
+ if (i < 0)
+ return i;
+
+ if (neg) {
+ sysctl_sched_features &= ~(1UL << i);
+ sched_feat_disable(i);
+ } else {
+ sysctl_sched_features |= (1UL << i);
+ sched_feat_enable(i);
}
- return i;
+ return 0;
}
static ssize_t
@@ -133,7 +132,7 @@
{
char buf[64];
char *cmp;
- int i;
+ int ret;
struct inode *inode;
if (cnt > 63)
@@ -148,10 +147,10 @@
/* Ensure the static_key remains in a consistent state */
inode = file_inode(filp);
inode_lock(inode);
- i = sched_feat_set(cmp);
+ ret = sched_feat_set(cmp);
inode_unlock(inode);
- if (i == __SCHED_FEAT_NR)
- return -EINVAL;
+ if (ret < 0)
+ return ret;
*ppos += cnt;
@@ -623,8 +622,6 @@
#undef PU
}
-extern __read_mostly int sched_clock_running;
-
static void print_cpu(struct seq_file *m, int cpu)
{
struct rq *rq = cpu_rq(cpu);
@@ -843,8 +840,8 @@
unsigned long tpf, unsigned long gsf, unsigned long gpf)
{
SEQ_printf(m, "numa_faults node=%d ", node);
- SEQ_printf(m, "task_private=%lu task_shared=%lu ", tsf, tpf);
- SEQ_printf(m, "group_private=%lu group_shared=%lu\n", gsf, gpf);
+ SEQ_printf(m, "task_private=%lu task_shared=%lu ", tpf, tsf);
+ SEQ_printf(m, "group_private=%lu group_shared=%lu\n", gpf, gsf);
}
#endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be..309c93f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -255,9 +255,6 @@
return cfs_rq->rq;
}
-/* An entity is a task if it doesn't "own" a runqueue */
-#define entity_is_task(se) (!se->my_q)
-
static inline struct task_struct *task_of(struct sched_entity *se)
{
SCHED_WARN_ON(!entity_is_task(se));
@@ -419,7 +416,6 @@
return container_of(cfs_rq, struct rq, cfs);
}
-#define entity_is_task(se) 1
#define for_each_sched_entity(se) \
for (; se; se = NULL)
@@ -692,7 +688,7 @@
}
#ifdef CONFIG_SMP
-
+#include "pelt.h"
#include "sched-pelt.h"
static int select_idle_sibling(struct task_struct *p, int prev_cpu, int cpu);
@@ -735,11 +731,12 @@
* To solve this problem, we also cap the util_avg of successive tasks to
* only 1/2 of the left utilization budget:
*
- * util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
+ * util_avg_cap = (cpu_scale - cfs_rq->avg.util_avg) / 2^n
*
- * where n denotes the nth task.
+ * where n denotes the nth task and cpu_scale the CPU capacity.
*
- * For example, a simplest series from the beginning would be like:
+ * For example, for a CPU with 1024 of capacity, a simplest series from
+ * the beginning would be like:
*
* task util_avg: 512, 256, 128, 64, 32, 16, 8, ...
* cfs_rq util_avg: 512, 768, 896, 960, 992, 1008, 1016, ...
@@ -751,7 +748,8 @@
{
struct cfs_rq *cfs_rq = cfs_rq_of(se);
struct sched_avg *sa = &se->avg;
- long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;
+ long cpu_scale = arch_scale_cpu_capacity(NULL, cpu_of(rq_of(cfs_rq)));
+ long cap = (long)(cpu_scale - cfs_rq->avg.util_avg) / 2;
if (cap > 0) {
if (cfs_rq->avg.util_avg != 0) {
@@ -1314,7 +1312,7 @@
* of each group. Skip other nodes.
*/
if (sched_numa_topology_type == NUMA_BACKPLANE &&
- dist > maxdist)
+ dist >= maxdist)
continue;
/* Add up the faults from nearby nodes. */
@@ -1452,15 +1450,12 @@
/* Cached statistics for all CPUs within a node */
struct numa_stats {
- unsigned long nr_running;
unsigned long load;
/* Total compute capacity of CPUs on a node */
unsigned long compute_capacity;
- /* Approximate capacity in terms of runnable tasks on a node */
- unsigned long task_capacity;
- int has_free_capacity;
+ unsigned int nr_running;
};
/*
@@ -1487,8 +1482,7 @@
* the @ns structure is NULL'ed and task_numa_compare() will
* not find this node attractive.
*
- * We'll either bail at !has_free_capacity, or we'll detect a huge
- * imbalance and bail there.
+ * We'll detect a huge imbalance and bail there.
*/
if (!cpus)
return;
@@ -1497,9 +1491,8 @@
smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, ns->compute_capacity);
capacity = cpus / smt; /* cores */
- ns->task_capacity = min_t(unsigned, capacity,
+ capacity = min_t(unsigned, capacity,
DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE));
- ns->has_free_capacity = (ns->nr_running < ns->task_capacity);
}
struct task_numa_env {
@@ -1548,28 +1541,12 @@
src_capacity = env->src_stats.compute_capacity;
dst_capacity = env->dst_stats.compute_capacity;
- /* We care about the slope of the imbalance, not the direction. */
- if (dst_load < src_load)
- swap(dst_load, src_load);
+ imb = abs(dst_load * src_capacity - src_load * dst_capacity);
- /* Is the difference below the threshold? */
- imb = dst_load * src_capacity * 100 -
- src_load * dst_capacity * env->imbalance_pct;
- if (imb <= 0)
- return false;
-
- /*
- * The imbalance is above the allowed threshold.
- * Compare it with the old imbalance.
- */
orig_src_load = env->src_stats.load;
orig_dst_load = env->dst_stats.load;
- if (orig_dst_load < orig_src_load)
- swap(orig_dst_load, orig_src_load);
-
- old_imb = orig_dst_load * src_capacity * 100 -
- orig_src_load * dst_capacity * env->imbalance_pct;
+ old_imb = abs(orig_dst_load * src_capacity - orig_src_load * dst_capacity);
/* Would this change make things worse? */
return (imb > old_imb);
@@ -1582,9 +1559,8 @@
* be exchanged with the source task
*/
static void task_numa_compare(struct task_numa_env *env,
- long taskimp, long groupimp)
+ long taskimp, long groupimp, bool maymove)
{
- struct rq *src_rq = cpu_rq(env->src_cpu);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
struct task_struct *cur;
long src_load, dst_load;
@@ -1605,97 +1581,73 @@
if (cur == env->p)
goto unlock;
+ if (!cur) {
+ if (maymove || imp > env->best_imp)
+ goto assign;
+ else
+ goto unlock;
+ }
+
/*
* "imp" is the fault differential for the source task between the
* source and destination node. Calculate the total differential for
* the source task and potential destination task. The more negative
- * the value is, the more rmeote accesses that would be expected to
+ * the value is, the more remote accesses that would be expected to
* be incurred if the tasks were swapped.
*/
- if (cur) {
- /* Skip this swap candidate if cannot move to the source CPU: */
- if (!cpumask_test_cpu(env->src_cpu, &cur->cpus_allowed))
- goto unlock;
-
- /*
- * If dst and source tasks are in the same NUMA group, or not
- * in any group then look only at task weights.
- */
- if (cur->numa_group == env->p->numa_group) {
- imp = taskimp + task_weight(cur, env->src_nid, dist) -
- task_weight(cur, env->dst_nid, dist);
- /*
- * Add some hysteresis to prevent swapping the
- * tasks within a group over tiny differences.
- */
- if (cur->numa_group)
- imp -= imp/16;
- } else {
- /*
- * Compare the group weights. If a task is all by
- * itself (not part of a group), use the task weight
- * instead.
- */
- if (cur->numa_group)
- imp += group_weight(cur, env->src_nid, dist) -
- group_weight(cur, env->dst_nid, dist);
- else
- imp += task_weight(cur, env->src_nid, dist) -
- task_weight(cur, env->dst_nid, dist);
- }
- }
-
- if (imp <= env->best_imp && moveimp <= env->best_imp)
+ /* Skip this swap candidate if cannot move to the source cpu */
+ if (!cpumask_test_cpu(env->src_cpu, &cur->cpus_allowed))
goto unlock;
- if (!cur) {
- /* Is there capacity at our destination? */
- if (env->src_stats.nr_running <= env->src_stats.task_capacity &&
- !env->dst_stats.has_free_capacity)
- goto unlock;
-
- goto balance;
- }
-
- /* Balance doesn't matter much if we're running a task per CPU: */
- if (imp > env->best_imp && src_rq->nr_running == 1 &&
- dst_rq->nr_running == 1)
- goto assign;
-
/*
- * In the overloaded case, try and keep the load balanced.
+ * If dst and source tasks are in the same NUMA group, or not
+ * in any group then look only at task weights.
*/
-balance:
- load = task_h_load(env->p);
- dst_load = env->dst_stats.load + load;
- src_load = env->src_stats.load - load;
-
- if (moveimp > imp && moveimp > env->best_imp) {
+ if (cur->numa_group == env->p->numa_group) {
+ imp = taskimp + task_weight(cur, env->src_nid, dist) -
+ task_weight(cur, env->dst_nid, dist);
/*
- * If the improvement from just moving env->p direction is
- * better than swapping tasks around, check if a move is
- * possible. Store a slightly smaller score than moveimp,
- * so an actually idle CPU will win.
+ * Add some hysteresis to prevent swapping the
+ * tasks within a group over tiny differences.
*/
- if (!load_too_imbalanced(src_load, dst_load, env)) {
- imp = moveimp - 1;
- cur = NULL;
- goto assign;
- }
+ if (cur->numa_group)
+ imp -= imp / 16;
+ } else {
+ /*
+ * Compare the group weights. If a task is all by itself
+ * (not part of a group), use the task weight instead.
+ */
+ if (cur->numa_group && env->p->numa_group)
+ imp += group_weight(cur, env->src_nid, dist) -
+ group_weight(cur, env->dst_nid, dist);
+ else
+ imp += task_weight(cur, env->src_nid, dist) -
+ task_weight(cur, env->dst_nid, dist);
}
if (imp <= env->best_imp)
goto unlock;
- if (cur) {
- load = task_h_load(cur);
- dst_load -= load;
- src_load += load;
+ if (maymove && moveimp > imp && moveimp > env->best_imp) {
+ imp = moveimp - 1;
+ cur = NULL;
+ goto assign;
}
+ /*
+ * In the overloaded case, try and keep the load balanced.
+ */
+ load = task_h_load(env->p) - task_h_load(cur);
+ if (!load)
+ goto assign;
+
+ dst_load = env->dst_stats.load + load;
+ src_load = env->src_stats.load - load;
+
if (load_too_imbalanced(src_load, dst_load, env))
goto unlock;
+assign:
/*
* One idle CPU per node is evaluated for a task numa move.
* Call select_idle_sibling to maybe find a better one.
@@ -1711,7 +1663,6 @@
local_irq_enable();
}
-assign:
task_numa_assign(env, cur, imp);
unlock:
rcu_read_unlock();
@@ -1720,43 +1671,30 @@
static void task_numa_find_cpu(struct task_numa_env *env,
long taskimp, long groupimp)
{
+ long src_load, dst_load, load;
+ bool maymove = false;
int cpu;
+ load = task_h_load(env->p);
+ dst_load = env->dst_stats.load + load;
+ src_load = env->src_stats.load - load;
+
+ /*
+ * If the improvement from just moving env->p direction is better
+ * than swapping tasks around, check if a move is possible.
+ */
+ maymove = !load_too_imbalanced(src_load, dst_load, env);
+
for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) {
/* Skip this CPU if the source task cannot migrate */
if (!cpumask_test_cpu(cpu, &env->p->cpus_allowed))
continue;
env->dst_cpu = cpu;
- task_numa_compare(env, taskimp, groupimp);
+ task_numa_compare(env, taskimp, groupimp, maymove);
}
}
-/* Only move tasks to a NUMA node less busy than the current node. */
-static bool numa_has_capacity(struct task_numa_env *env)
-{
- struct numa_stats *src = &env->src_stats;
- struct numa_stats *dst = &env->dst_stats;
-
- if (src->has_free_capacity && !dst->has_free_capacity)
- return false;
-
- /*
- * Only consider a task move if the source has a higher load
- * than the destination, corrected for CPU capacity on each node.
- *
- * src->load dst->load
- * --------------------- vs ---------------------
- * src->compute_capacity dst->compute_capacity
- */
- if (src->load * dst->compute_capacity * env->imbalance_pct >
-
- dst->load * src->compute_capacity * 100)
- return true;
-
- return false;
-}
-
static int task_numa_migrate(struct task_struct *p)
{
struct task_numa_env env = {
@@ -1797,7 +1735,7 @@
* elsewhere, so there is no point in (re)trying.
*/
if (unlikely(!sd)) {
- p->numa_preferred_nid = task_node(p);
+ sched_setnuma(p, task_node(p));
return -EINVAL;
}
@@ -1811,8 +1749,7 @@
update_numa_stats(&env.dst_stats, env.dst_nid);
/* Try to find a spot on the preferred nid. */
- if (numa_has_capacity(&env))
- task_numa_find_cpu(&env, taskimp, groupimp);
+ task_numa_find_cpu(&env, taskimp, groupimp);
/*
* Look at other nodes in these cases:
@@ -1842,8 +1779,7 @@
env.dist = dist;
env.dst_nid = nid;
update_numa_stats(&env.dst_stats, env.dst_nid);
- if (numa_has_capacity(&env))
- task_numa_find_cpu(&env, taskimp, groupimp);
+ task_numa_find_cpu(&env, taskimp, groupimp);
}
}
@@ -1856,15 +1792,13 @@
* trying for a better one later. Do not set the preferred node here.
*/
if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
-
if (env.best_cpu == -1)
nid = env.src_nid;
else
- nid = env.dst_nid;
+ nid = cpu_to_node(env.best_cpu);
- if (ng->active_nodes > 1 && numa_is_active_node(env.dst_nid, ng))
- sched_setnuma(p, env.dst_nid);
+ if (nid != p->numa_preferred_nid)
+ sched_setnuma(p, nid);
}
/* No better CPU than the current one was found. */
@@ -1884,7 +1818,8 @@
return ret;
}
- ret = migrate_swap(p, env.best_task);
+ ret = migrate_swap(p, env.best_task, env.best_cpu, env.src_cpu);
+
if (ret != 0)
trace_sched_stick_numa(p, env.src_cpu, task_cpu(env.best_task));
put_task_struct(env.best_task);
@@ -2144,8 +2079,8 @@
static void task_numa_placement(struct task_struct *p)
{
- int seq, nid, max_nid = -1, max_group_nid = -1;
- unsigned long max_faults = 0, max_group_faults = 0;
+ int seq, nid, max_nid = -1;
+ unsigned long max_faults = 0;
unsigned long fault_types[2] = { 0, 0 };
unsigned long total_faults;
u64 runtime, period;
@@ -2224,33 +2159,30 @@
}
}
- if (faults > max_faults) {
- max_faults = faults;
+ if (!p->numa_group) {
+ if (faults > max_faults) {
+ max_faults = faults;
+ max_nid = nid;
+ }
+ } else if (group_faults > max_faults) {
+ max_faults = group_faults;
max_nid = nid;
}
-
- if (group_faults > max_group_faults) {
- max_group_faults = group_faults;
- max_group_nid = nid;
- }
}
- update_task_scan_period(p, fault_types[0], fault_types[1]);
-
if (p->numa_group) {
numa_group_count_active_nodes(p->numa_group);
spin_unlock_irq(group_lock);
- max_nid = preferred_group_nid(p, max_group_nid);
+ max_nid = preferred_group_nid(p, max_nid);
}
if (max_faults) {
/* Set the new preferred node */
if (max_nid != p->numa_preferred_nid)
sched_setnuma(p, max_nid);
-
- if (task_node(p) != p->numa_preferred_nid)
- numa_migrate_preferred(p);
}
+
+ update_task_scan_period(p, fault_types[0], fault_types[1]);
}
static inline int get_numa_group(struct numa_group *grp)
@@ -2450,14 +2382,14 @@
numa_is_active_node(mem_node, ng))
local = 1;
- task_numa_placement(p);
-
/*
* Retry task to preferred node migration periodically, in case it
* case it previously failed, or the scheduler moved us.
*/
- if (time_after(jiffies, p->numa_migrate_retry))
+ if (time_after(jiffies, p->numa_migrate_retry)) {
+ task_numa_placement(p);
numa_migrate_preferred(p);
+ }
if (migrated)
p->numa_pages_migrated += pages;
@@ -2749,19 +2681,6 @@
} while (0)
#ifdef CONFIG_SMP
-/*
- * XXX we want to get rid of these helpers and use the full load resolution.
- */
-static inline long se_weight(struct sched_entity *se)
-{
- return scale_load_down(se->load.weight);
-}
-
-static inline long se_runnable(struct sched_entity *se)
-{
- return scale_load_down(se->runnable_weight);
-}
-
static inline void
enqueue_runnable_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
@@ -3062,314 +2981,6 @@
}
#ifdef CONFIG_SMP
-/*
- * Approximate:
- * val * y^n, where y^32 ~= 0.5 (~1 scheduling period)
- */
-static u64 decay_load(u64 val, u64 n)
-{
- unsigned int local_n;
-
- if (unlikely(n > LOAD_AVG_PERIOD * 63))
- return 0;
-
- /* after bounds checking we can collapse to 32-bit */
- local_n = n;
-
- /*
- * As y^PERIOD = 1/2, we can combine
- * y^n = 1/2^(n/PERIOD) * y^(n%PERIOD)
- * With a look-up table which covers y^n (n<PERIOD)
- *
- * To achieve constant time decay_load.
- */
- if (unlikely(local_n >= LOAD_AVG_PERIOD)) {
- val >>= local_n / LOAD_AVG_PERIOD;
- local_n %= LOAD_AVG_PERIOD;
- }
-
- val = mul_u64_u32_shr(val, runnable_avg_yN_inv[local_n], 32);
- return val;
-}
-
-static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
-{
- u32 c1, c2, c3 = d3; /* y^0 == 1 */
-
- /*
- * c1 = d1 y^p
- */
- c1 = decay_load((u64)d1, periods);
-
- /*
- * p-1
- * c2 = 1024 \Sum y^n
- * n=1
- *
- * inf inf
- * = 1024 ( \Sum y^n - \Sum y^n - y^0 )
- * n=0 n=p
- */
- c2 = LOAD_AVG_MAX - decay_load(LOAD_AVG_MAX, periods) - 1024;
-
- return c1 + c2 + c3;
-}
-
-/*
- * Accumulate the three separate parts of the sum; d1 the remainder
- * of the last (incomplete) period, d2 the span of full periods and d3
- * the remainder of the (incomplete) current period.
- *
- * d1 d2 d3
- * ^ ^ ^
- * | | |
- * |<->|<----------------->|<--->|
- * ... |---x---|------| ... |------|-----x (now)
- *
- * p-1
- * u' = (u + d1) y^p + 1024 \Sum y^n + d3 y^0
- * n=1
- *
- * = u y^p + (Step 1)
- *
- * p-1
- * d1 y^p + 1024 \Sum y^n + d3 y^0 (Step 2)
- * n=1
- */
-static __always_inline u32
-accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
- unsigned long load, unsigned long runnable, int running)
-{
- unsigned long scale_freq, scale_cpu;
- u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
- u64 periods;
-
- scale_freq = arch_scale_freq_capacity(cpu);
- scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
-
- delta += sa->period_contrib;
- periods = delta / 1024; /* A period is 1024us (~1ms) */
-
- /*
- * Step 1: decay old *_sum if we crossed period boundaries.
- */
- if (periods) {
- sa->load_sum = decay_load(sa->load_sum, periods);
- sa->runnable_load_sum =
- decay_load(sa->runnable_load_sum, periods);
- sa->util_sum = decay_load((u64)(sa->util_sum), periods);
-
- /*
- * Step 2
- */
- delta %= 1024;
- contrib = __accumulate_pelt_segments(periods,
- 1024 - sa->period_contrib, delta);
- }
- sa->period_contrib = delta;
-
- contrib = cap_scale(contrib, scale_freq);
- if (load)
- sa->load_sum += load * contrib;
- if (runnable)
- sa->runnable_load_sum += runnable * contrib;
- if (running)
- sa->util_sum += contrib * scale_cpu;
-
- return periods;
-}
-
-/*
- * We can represent the historical contribution to runnable average as the
- * coefficients of a geometric series. To do this we sub-divide our runnable
- * history into segments of approximately 1ms (1024us); label the segment that
- * occurred N-ms ago p_N, with p_0 corresponding to the current period, e.g.
- *
- * [<- 1024us ->|<- 1024us ->|<- 1024us ->| ...
- * p0 p1 p2
- * (now) (~1ms ago) (~2ms ago)
- *
- * Let u_i denote the fraction of p_i that the entity was runnable.
- *
- * We then designate the fractions u_i as our co-efficients, yielding the
- * following representation of historical load:
- * u_0 + u_1*y + u_2*y^2 + u_3*y^3 + ...
- *
- * We choose y based on the with of a reasonably scheduling period, fixing:
- * y^32 = 0.5
- *
- * This means that the contribution to load ~32ms ago (u_32) will be weighted
- * approximately half as much as the contribution to load within the last ms
- * (u_0).
- *
- * When a period "rolls over" and we have new u_0`, multiplying the previous
- * sum again by y is sufficient to update:
- * load_avg = u_0` + y*(u_0 + u_1*y + u_2*y^2 + ... )
- * = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
- */
-static __always_inline int
-___update_load_sum(u64 now, int cpu, struct sched_avg *sa,
- unsigned long load, unsigned long runnable, int running)
-{
- u64 delta;
-
- delta = now - sa->last_update_time;
- /*
- * This should only happen when time goes backwards, which it
- * unfortunately does during sched clock init when we swap over to TSC.
- */
- if ((s64)delta < 0) {
- sa->last_update_time = now;
- return 0;
- }
-
- /*
- * Use 1024ns as the unit of measurement since it's a reasonable
- * approximation of 1us and fast to compute.
- */
- delta >>= 10;
- if (!delta)
- return 0;
-
- sa->last_update_time += delta << 10;
-
- /*
- * running is a subset of runnable (weight) so running can't be set if
- * runnable is clear. But there are some corner cases where the current
- * se has been already dequeued but cfs_rq->curr still points to it.
- * This means that weight will be 0 but not running for a sched_entity
- * but also for a cfs_rq if the latter becomes idle. As an example,
- * this happens during idle_balance() which calls
- * update_blocked_averages()
- */
- if (!load)
- runnable = running = 0;
-
- /*
- * Now we know we crossed measurement unit boundaries. The *_avg
- * accrues by two steps:
- *
- * Step 1: accumulate *_sum since last_update_time. If we haven't
- * crossed period boundaries, finish.
- */
- if (!accumulate_sum(delta, cpu, sa, load, runnable, running))
- return 0;
-
- return 1;
-}
-
-static __always_inline void
-___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runnable)
-{
- u32 divider = LOAD_AVG_MAX - 1024 + sa->period_contrib;
-
- /*
- * Step 2: update *_avg.
- */
- sa->load_avg = div_u64(load * sa->load_sum, divider);
- sa->runnable_load_avg = div_u64(runnable * sa->runnable_load_sum, divider);
- sa->util_avg = sa->util_sum / divider;
-}
-
-/*
- * When a task is dequeued, its estimated utilization should not be update if
- * its util_avg has not been updated at least once.
- * This flag is used to synchronize util_avg updates with util_est updates.
- * We map this information into the LSB bit of the utilization saved at
- * dequeue time (i.e. util_est.dequeued).
- */
-#define UTIL_AVG_UNCHANGED 0x1
-
-static inline void cfs_se_util_change(struct sched_avg *avg)
-{
- unsigned int enqueued;
-
- if (!sched_feat(UTIL_EST))
- return;
-
- /* Avoid store if the flag has been already set */
- enqueued = avg->util_est.enqueued;
- if (!(enqueued & UTIL_AVG_UNCHANGED))
- return;
-
- /* Reset flag to report util_avg has been updated */
- enqueued &= ~UTIL_AVG_UNCHANGED;
- WRITE_ONCE(avg->util_est.enqueued, enqueued);
-}
-
-/*
- * sched_entity:
- *
- * task:
- * se_runnable() == se_weight()
- *
- * group: [ see update_cfs_group() ]
- * se_weight() = tg->weight * grq->load_avg / tg->load_avg
- * se_runnable() = se_weight(se) * grq->runnable_load_avg / grq->load_avg
- *
- * load_sum := runnable_sum
- * load_avg = se_weight(se) * runnable_avg
- *
- * runnable_load_sum := runnable_sum
- * runnable_load_avg = se_runnable(se) * runnable_avg
- *
- * XXX collapse load_sum and runnable_load_sum
- *
- * cfq_rs:
- *
- * load_sum = \Sum se_weight(se) * se->avg.load_sum
- * load_avg = \Sum se->avg.load_avg
- *
- * runnable_load_sum = \Sum se_runnable(se) * se->avg.runnable_load_sum
- * runnable_load_avg = \Sum se->avg.runable_load_avg
- */
-
-static int
-__update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
-{
- if (entity_is_task(se))
- se->runnable_weight = se->load.weight;
-
- if (___update_load_sum(now, cpu, &se->avg, 0, 0, 0)) {
- ___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
- return 1;
- }
-
- return 0;
-}
-
-static int
-__update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
- if (entity_is_task(se))
- se->runnable_weight = se->load.weight;
-
- if (___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
- cfs_rq->curr == se)) {
-
- ___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
- cfs_se_util_change(&se->avg);
- return 1;
- }
-
- return 0;
-}
-
-static int
-__update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
-{
- if (___update_load_sum(now, cpu, &cfs_rq->avg,
- scale_load_down(cfs_rq->load.weight),
- scale_load_down(cfs_rq->runnable_weight),
- cfs_rq->curr != NULL)) {
-
- ___update_load_avg(&cfs_rq->avg, 1, 1);
- return 1;
- }
-
- return 0;
-}
-
#ifdef CONFIG_FAIR_GROUP_SCHED
/**
* update_tg_load_avg - update the tg's load avg
@@ -4037,12 +3648,6 @@
#else /* CONFIG_SMP */
-static inline int
-update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
-{
- return 0;
-}
-
#define UPDATE_TG 0x0
#define SKIP_AGE_LOAD 0x0
#define DO_ATTACH 0x0
@@ -4726,7 +4331,6 @@
throttled_hierarchy(dest_cfs_rq);
}
-/* updated child weight may affect parent so we have to do this bottom up */
static int tg_unthrottle_up(struct task_group *tg, void *data)
{
struct rq *rq = data;
@@ -5653,8 +5257,6 @@
this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
}
-
- sched_avg_update(this_rq);
}
/* Used instead of source_load when we know the type == 0 */
@@ -7294,8 +6896,8 @@
static int migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
{
struct numa_group *numa_group = rcu_dereference(p->numa_group);
- unsigned long src_faults, dst_faults;
- int src_nid, dst_nid;
+ unsigned long src_weight, dst_weight;
+ int src_nid, dst_nid, dist;
if (!static_branch_likely(&sched_numa_balancing))
return -1;
@@ -7322,18 +6924,19 @@
return 0;
/* Leaving a core idle is often worse than degrading locality. */
- if (env->idle != CPU_NOT_IDLE)
+ if (env->idle == CPU_IDLE)
return -1;
+ dist = node_distance(src_nid, dst_nid);
if (numa_group) {
- src_faults = group_faults(p, src_nid);
- dst_faults = group_faults(p, dst_nid);
+ src_weight = group_weight(p, src_nid, dist);
+ dst_weight = group_weight(p, dst_nid, dist);
} else {
- src_faults = task_faults(p, src_nid);
- dst_faults = task_faults(p, dst_nid);
+ src_weight = task_weight(p, src_nid, dist);
+ dst_weight = task_weight(p, dst_nid, dist);
}
- return dst_faults < src_faults;
+ return dst_weight < src_weight;
}
#else
@@ -7620,6 +7223,22 @@
return false;
}
+static inline bool others_have_blocked(struct rq *rq)
+{
+ if (READ_ONCE(rq->avg_rt.util_avg))
+ return true;
+
+ if (READ_ONCE(rq->avg_dl.util_avg))
+ return true;
+
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+ if (READ_ONCE(rq->avg_irq.util_avg))
+ return true;
+#endif
+
+ return false;
+}
+
#ifdef CONFIG_FAIR_GROUP_SCHED
static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
@@ -7679,6 +7298,12 @@
if (cfs_rq_has_blocked(cfs_rq))
done = false;
}
+ update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
+ update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
+ update_irq_load_avg(rq, 0);
+ /* Don't need periodic decay once load/util_avg are null */
+ if (others_have_blocked(rq))
+ done = false;
#ifdef CONFIG_NO_HZ_COMMON
rq->last_blocked_load_update_tick = jiffies;
@@ -7744,9 +7369,12 @@
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+ update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
+ update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
+ update_irq_load_avg(rq, 0);
#ifdef CONFIG_NO_HZ_COMMON
rq->last_blocked_load_update_tick = jiffies;
- if (!cfs_rq_has_blocked(cfs_rq))
+ if (!cfs_rq_has_blocked(cfs_rq) && !others_have_blocked(rq))
rq->has_blocked_load = 0;
#endif
rq_unlock_irqrestore(rq, &rf);
@@ -7856,39 +7484,32 @@
static unsigned long scale_rt_capacity(int cpu)
{
struct rq *rq = cpu_rq(cpu);
- u64 total, used, age_stamp, avg;
- s64 delta;
+ unsigned long max = arch_scale_cpu_capacity(NULL, cpu);
+ unsigned long used, free;
+ unsigned long irq;
- /*
- * Since we're reading these variables without serialization make sure
- * we read them once before doing sanity checks on them.
- */
- age_stamp = READ_ONCE(rq->age_stamp);
- avg = READ_ONCE(rq->rt_avg);
- delta = __rq_clock_broken(rq) - age_stamp;
+ irq = cpu_util_irq(rq);
- if (unlikely(delta < 0))
- delta = 0;
+ if (unlikely(irq >= max))
+ return 1;
- total = sched_avg_period() + delta;
+ used = READ_ONCE(rq->avg_rt.util_avg);
+ used += READ_ONCE(rq->avg_dl.util_avg);
- used = div_u64(avg, total);
+ if (unlikely(used >= max))
+ return 1;
- if (likely(used < SCHED_CAPACITY_SCALE))
- return SCHED_CAPACITY_SCALE - used;
+ free = max - used;
- return 1;
+ return scale_irq_capacity(free, irq, max);
}
static void update_cpu_capacity(struct sched_domain *sd, int cpu)
{
- unsigned long capacity = arch_scale_cpu_capacity(sd, cpu);
+ unsigned long capacity = scale_rt_capacity(cpu);
struct sched_group *sdg = sd->groups;
- cpu_rq(cpu)->cpu_capacity_orig = capacity;
-
- capacity *= scale_rt_capacity(cpu);
- capacity >>= SCHED_CAPACITY_SHIFT;
+ cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(sd, cpu);
if (!capacity)
capacity = 1;
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
new file mode 100644
index 0000000..35475c0
--- /dev/null
+++ b/kernel/sched/pelt.c
@@ -0,0 +1,399 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Per Entity Load Tracking
+ *
+ * Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ * Interactivity improvements by Mike Galbraith
+ * (C) 2007 Mike Galbraith <efault@gmx.de>
+ *
+ * Various enhancements by Dmitry Adamushko.
+ * (C) 2007 Dmitry Adamushko <dmitry.adamushko@gmail.com>
+ *
+ * Group scheduling enhancements by Srivatsa Vaddagiri
+ * Copyright IBM Corporation, 2007
+ * Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
+ *
+ * Scaled math optimizations by Thomas Gleixner
+ * Copyright (C) 2007, Thomas Gleixner <tglx@linutronix.de>
+ *
+ * Adaptive scheduling granularity, math enhancements by Peter Zijlstra
+ * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
+ *
+ * Move PELT related code from fair.c into this pelt.c file
+ * Author: Vincent Guittot <vincent.guittot@linaro.org>
+ */
+
+#include <linux/sched.h>
+#include "sched.h"
+#include "sched-pelt.h"
+#include "pelt.h"
+
+/*
+ * Approximate:
+ * val * y^n, where y^32 ~= 0.5 (~1 scheduling period)
+ */
+static u64 decay_load(u64 val, u64 n)
+{
+ unsigned int local_n;
+
+ if (unlikely(n > LOAD_AVG_PERIOD * 63))
+ return 0;
+
+ /* after bounds checking we can collapse to 32-bit */
+ local_n = n;
+
+ /*
+ * As y^PERIOD = 1/2, we can combine
+ * y^n = 1/2^(n/PERIOD) * y^(n%PERIOD)
+ * With a look-up table which covers y^n (n<PERIOD)
+ *
+ * To achieve constant time decay_load.
+ */
+ if (unlikely(local_n >= LOAD_AVG_PERIOD)) {
+ val >>= local_n / LOAD_AVG_PERIOD;
+ local_n %= LOAD_AVG_PERIOD;
+ }
+
+ val = mul_u64_u32_shr(val, runnable_avg_yN_inv[local_n], 32);
+ return val;
+}
+
+static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
+{
+ u32 c1, c2, c3 = d3; /* y^0 == 1 */
+
+ /*
+ * c1 = d1 y^p
+ */
+ c1 = decay_load((u64)d1, periods);
+
+ /*
+ * p-1
+ * c2 = 1024 \Sum y^n
+ * n=1
+ *
+ * inf inf
+ * = 1024 ( \Sum y^n - \Sum y^n - y^0 )
+ * n=0 n=p
+ */
+ c2 = LOAD_AVG_MAX - decay_load(LOAD_AVG_MAX, periods) - 1024;
+
+ return c1 + c2 + c3;
+}
+
+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+
+/*
+ * Accumulate the three separate parts of the sum; d1 the remainder
+ * of the last (incomplete) period, d2 the span of full periods and d3
+ * the remainder of the (incomplete) current period.
+ *
+ * d1 d2 d3
+ * ^ ^ ^
+ * | | |
+ * |<->|<----------------->|<--->|
+ * ... |---x---|------| ... |------|-----x (now)
+ *
+ * p-1
+ * u' = (u + d1) y^p + 1024 \Sum y^n + d3 y^0
+ * n=1
+ *
+ * = u y^p + (Step 1)
+ *
+ * p-1
+ * d1 y^p + 1024 \Sum y^n + d3 y^0 (Step 2)
+ * n=1
+ */
+static __always_inline u32
+accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
+ unsigned long load, unsigned long runnable, int running)
+{
+ unsigned long scale_freq, scale_cpu;
+ u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
+ u64 periods;
+
+ scale_freq = arch_scale_freq_capacity(cpu);
+ scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
+ delta += sa->period_contrib;
+ periods = delta / 1024; /* A period is 1024us (~1ms) */
+
+ /*
+ * Step 1: decay old *_sum if we crossed period boundaries.
+ */
+ if (periods) {
+ sa->load_sum = decay_load(sa->load_sum, periods);
+ sa->runnable_load_sum =
+ decay_load(sa->runnable_load_sum, periods);
+ sa->util_sum = decay_load((u64)(sa->util_sum), periods);
+
+ /*
+ * Step 2
+ */
+ delta %= 1024;
+ contrib = __accumulate_pelt_segments(periods,
+ 1024 - sa->period_contrib, delta);
+ }
+ sa->period_contrib = delta;
+
+ contrib = cap_scale(contrib, scale_freq);
+ if (load)
+ sa->load_sum += load * contrib;
+ if (runnable)
+ sa->runnable_load_sum += runnable * contrib;
+ if (running)
+ sa->util_sum += contrib * scale_cpu;
+
+ return periods;
+}
+
+/*
+ * We can represent the historical contribution to runnable average as the
+ * coefficients of a geometric series. To do this we sub-divide our runnable
+ * history into segments of approximately 1ms (1024us); label the segment that
+ * occurred N-ms ago p_N, with p_0 corresponding to the current period, e.g.
+ *
+ * [<- 1024us ->|<- 1024us ->|<- 1024us ->| ...
+ * p0 p1 p2
+ * (now) (~1ms ago) (~2ms ago)
+ *
+ * Let u_i denote the fraction of p_i that the entity was runnable.
+ *
+ * We then designate the fractions u_i as our co-efficients, yielding the
+ * following representation of historical load:
+ * u_0 + u_1*y + u_2*y^2 + u_3*y^3 + ...
+ *
+ * We choose y based on the with of a reasonably scheduling period, fixing:
+ * y^32 = 0.5
+ *
+ * This means that the contribution to load ~32ms ago (u_32) will be weighted
+ * approximately half as much as the contribution to load within the last ms
+ * (u_0).
+ *
+ * When a period "rolls over" and we have new u_0`, multiplying the previous
+ * sum again by y is sufficient to update:
+ * load_avg = u_0` + y*(u_0 + u_1*y + u_2*y^2 + ... )
+ * = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
+ */
+static __always_inline int
+___update_load_sum(u64 now, int cpu, struct sched_avg *sa,
+ unsigned long load, unsigned long runnable, int running)
+{
+ u64 delta;
+
+ delta = now - sa->last_update_time;
+ /*
+ * This should only happen when time goes backwards, which it
+ * unfortunately does during sched clock init when we swap over to TSC.
+ */
+ if ((s64)delta < 0) {
+ sa->last_update_time = now;
+ return 0;
+ }
+
+ /*
+ * Use 1024ns as the unit of measurement since it's a reasonable
+ * approximation of 1us and fast to compute.
+ */
+ delta >>= 10;
+ if (!delta)
+ return 0;
+
+ sa->last_update_time += delta << 10;
+
+ /*
+ * running is a subset of runnable (weight) so running can't be set if
+ * runnable is clear. But there are some corner cases where the current
+ * se has been already dequeued but cfs_rq->curr still points to it.
+ * This means that weight will be 0 but not running for a sched_entity
+ * but also for a cfs_rq if the latter becomes idle. As an example,
+ * this happens during idle_balance() which calls
+ * update_blocked_averages()
+ */
+ if (!load)
+ runnable = running = 0;
+
+ /*
+ * Now we know we crossed measurement unit boundaries. The *_avg
+ * accrues by two steps:
+ *
+ * Step 1: accumulate *_sum since last_update_time. If we haven't
+ * crossed period boundaries, finish.
+ */
+ if (!accumulate_sum(delta, cpu, sa, load, runnable, running))
+ return 0;
+
+ return 1;
+}
+
+static __always_inline void
+___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runnable)
+{
+ u32 divider = LOAD_AVG_MAX - 1024 + sa->period_contrib;
+
+ /*
+ * Step 2: update *_avg.
+ */
+ sa->load_avg = div_u64(load * sa->load_sum, divider);
+ sa->runnable_load_avg = div_u64(runnable * sa->runnable_load_sum, divider);
+ WRITE_ONCE(sa->util_avg, sa->util_sum / divider);
+}
+
+/*
+ * sched_entity:
+ *
+ * task:
+ * se_runnable() == se_weight()
+ *
+ * group: [ see update_cfs_group() ]
+ * se_weight() = tg->weight * grq->load_avg / tg->load_avg
+ * se_runnable() = se_weight(se) * grq->runnable_load_avg / grq->load_avg
+ *
+ * load_sum := runnable_sum
+ * load_avg = se_weight(se) * runnable_avg
+ *
+ * runnable_load_sum := runnable_sum
+ * runnable_load_avg = se_runnable(se) * runnable_avg
+ *
+ * XXX collapse load_sum and runnable_load_sum
+ *
+ * cfq_rq:
+ *
+ * load_sum = \Sum se_weight(se) * se->avg.load_sum
+ * load_avg = \Sum se->avg.load_avg
+ *
+ * runnable_load_sum = \Sum se_runnable(se) * se->avg.runnable_load_sum
+ * runnable_load_avg = \Sum se->avg.runable_load_avg
+ */
+
+int __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
+{
+ if (entity_is_task(se))
+ se->runnable_weight = se->load.weight;
+
+ if (___update_load_sum(now, cpu, &se->avg, 0, 0, 0)) {
+ ___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
+ return 1;
+ }
+
+ return 0;
+}
+
+int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ if (entity_is_task(se))
+ se->runnable_weight = se->load.weight;
+
+ if (___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
+ cfs_rq->curr == se)) {
+
+ ___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
+ cfs_se_util_change(&se->avg);
+ return 1;
+ }
+
+ return 0;
+}
+
+int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
+{
+ if (___update_load_sum(now, cpu, &cfs_rq->avg,
+ scale_load_down(cfs_rq->load.weight),
+ scale_load_down(cfs_rq->runnable_weight),
+ cfs_rq->curr != NULL)) {
+
+ ___update_load_avg(&cfs_rq->avg, 1, 1);
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * rt_rq:
+ *
+ * util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
+ * util_sum = cpu_scale * load_sum
+ * runnable_load_sum = load_sum
+ *
+ * load_avg and runnable_load_avg are not supported and meaningless.
+ *
+ */
+
+int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+ if (___update_load_sum(now, rq->cpu, &rq->avg_rt,
+ running,
+ running,
+ running)) {
+
+ ___update_load_avg(&rq->avg_rt, 1, 1);
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * dl_rq:
+ *
+ * util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
+ * util_sum = cpu_scale * load_sum
+ * runnable_load_sum = load_sum
+ *
+ */
+
+int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+ if (___update_load_sum(now, rq->cpu, &rq->avg_dl,
+ running,
+ running,
+ running)) {
+
+ ___update_load_avg(&rq->avg_dl, 1, 1);
+ return 1;
+ }
+
+ return 0;
+}
+
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+/*
+ * irq:
+ *
+ * util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
+ * util_sum = cpu_scale * load_sum
+ * runnable_load_sum = load_sum
+ *
+ */
+
+int update_irq_load_avg(struct rq *rq, u64 running)
+{
+ int ret = 0;
+ /*
+ * We know the time that has been used by interrupt since last update
+ * but we don't when. Let be pessimistic and assume that interrupt has
+ * happened just before the update. This is not so far from reality
+ * because interrupt will most probably wake up task and trig an update
+ * of rq clock during which the metric si updated.
+ * We start to decay with normal context time and then we add the
+ * interrupt context time.
+ * We can safely remove running from rq->clock because
+ * rq->clock += delta with delta >= running
+ */
+ ret = ___update_load_sum(rq->clock - running, rq->cpu, &rq->avg_irq,
+ 0,
+ 0,
+ 0);
+ ret += ___update_load_sum(rq->clock, rq->cpu, &rq->avg_irq,
+ 1,
+ 1,
+ 1);
+
+ if (ret)
+ ___update_load_avg(&rq->avg_irq, 1, 1);
+
+ return ret;
+}
+#endif
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
new file mode 100644
index 0000000..d2894db
--- /dev/null
+++ b/kernel/sched/pelt.h
@@ -0,0 +1,72 @@
+#ifdef CONFIG_SMP
+
+int __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se);
+int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se);
+int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq);
+int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
+int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
+
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+int update_irq_load_avg(struct rq *rq, u64 running);
+#else
+static inline int
+update_irq_load_avg(struct rq *rq, u64 running)
+{
+ return 0;
+}
+#endif
+
+/*
+ * When a task is dequeued, its estimated utilization should not be update if
+ * its util_avg has not been updated at least once.
+ * This flag is used to synchronize util_avg updates with util_est updates.
+ * We map this information into the LSB bit of the utilization saved at
+ * dequeue time (i.e. util_est.dequeued).
+ */
+#define UTIL_AVG_UNCHANGED 0x1
+
+static inline void cfs_se_util_change(struct sched_avg *avg)
+{
+ unsigned int enqueued;
+
+ if (!sched_feat(UTIL_EST))
+ return;
+
+ /* Avoid store if the flag has been already set */
+ enqueued = avg->util_est.enqueued;
+ if (!(enqueued & UTIL_AVG_UNCHANGED))
+ return;
+
+ /* Reset flag to report util_avg has been updated */
+ enqueued &= ~UTIL_AVG_UNCHANGED;
+ WRITE_ONCE(avg->util_est.enqueued, enqueued);
+}
+
+#else
+
+static inline int
+update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
+{
+ return 0;
+}
+
+static inline int
+update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+ return 0;
+}
+
+static inline int
+update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+ return 0;
+}
+
+static inline int
+update_irq_load_avg(struct rq *rq, u64 running)
+{
+ return 0;
+}
+#endif
+
+
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index eaaec83..2e2955a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -5,6 +5,8 @@
*/
#include "sched.h"
+#include "pelt.h"
+
int sched_rr_timeslice = RR_TIMESLICE;
int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
@@ -973,8 +975,6 @@
curr->se.exec_start = now;
cgroup_account_cputime(curr, delta_exec);
- sched_rt_avg_update(rq, delta_exec);
-
if (!rt_bandwidth_enabled())
return;
@@ -1578,6 +1578,14 @@
rt_queue_push_tasks(rq);
+ /*
+ * If prev task was rt, put_prev_task() has already updated the
+ * utilization. We only care of the case where we start to schedule a
+ * rt task
+ */
+ if (rq->curr->sched_class != &rt_sched_class)
+ update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
+
return p;
}
@@ -1585,6 +1593,8 @@
{
update_curr_rt(rq);
+ update_rt_rq_load_avg(rq_clock_task(rq), rq, 1);
+
/*
* The previous task needs to be made eligible for pushing
* if it is still active
@@ -2314,6 +2324,7 @@
struct sched_rt_entity *rt_se = &p->rt;
update_curr_rt(rq);
+ update_rt_rq_load_avg(rq_clock_task(rq), rq, 1);
watchdog(rq, p);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c7742dc..4a2e8ca 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -594,6 +594,7 @@
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
+
#endif /* CONFIG_SMP */
int rt_queued;
@@ -673,7 +674,26 @@
u64 bw_ratio;
};
+#ifdef CONFIG_FAIR_GROUP_SCHED
+/* An entity is a task if it doesn't "own" a runqueue */
+#define entity_is_task(se) (!se->my_q)
+#else
+#define entity_is_task(se) 1
+#endif
+
#ifdef CONFIG_SMP
+/*
+ * XXX we want to get rid of these helpers and use the full load resolution.
+ */
+static inline long se_weight(struct sched_entity *se)
+{
+ return scale_load_down(se->load.weight);
+}
+
+static inline long se_runnable(struct sched_entity *se)
+{
+ return scale_load_down(se->runnable_weight);
+}
static inline bool sched_asym_prefer(int a, int b)
{
@@ -833,8 +853,12 @@
struct list_head cfs_tasks;
- u64 rt_avg;
- u64 age_stamp;
+ struct sched_avg avg_rt;
+ struct sched_avg avg_dl;
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+#define HAVE_SCHED_AVG_IRQ
+ struct sched_avg avg_irq;
+#endif
u64 idle_stamp;
u64 avg_idle;
@@ -1075,7 +1099,8 @@
};
extern void sched_setnuma(struct task_struct *p, int node);
extern int migrate_task_to(struct task_struct *p, int cpu);
-extern int migrate_swap(struct task_struct *, struct task_struct *);
+extern int migrate_swap(struct task_struct *p, struct task_struct *t,
+ int cpu, int scpu);
extern void init_numa_balancing(unsigned long clone_flags, struct task_struct *p);
#else
static inline void
@@ -1690,15 +1715,9 @@
extern void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags);
-extern const_debug unsigned int sysctl_sched_time_avg;
extern const_debug unsigned int sysctl_sched_nr_migrate;
extern const_debug unsigned int sysctl_sched_migration_cost;
-static inline u64 sched_avg_period(void)
-{
- return (u64)sysctl_sched_time_avg * NSEC_PER_MSEC / 2;
-}
-
#ifdef CONFIG_SCHED_HRTICK
/*
@@ -1735,8 +1754,6 @@
#endif
#ifdef CONFIG_SMP
-extern void sched_avg_update(struct rq *rq);
-
#ifndef arch_scale_cpu_capacity
static __always_inline
unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
@@ -1747,12 +1764,6 @@
return SCHED_CAPACITY_SCALE;
}
#endif
-
-static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
-{
- rq->rt_avg += rt_delta * arch_scale_freq_capacity(cpu_of(rq));
- sched_avg_update(rq);
-}
#else
#ifndef arch_scale_cpu_capacity
static __always_inline
@@ -1761,8 +1772,6 @@
return SCHED_CAPACITY_SCALE;
}
#endif
-static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
-static inline void sched_avg_update(struct rq *rq) { }
#endif
struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
@@ -2177,11 +2186,16 @@
#endif
#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
-static inline unsigned long cpu_util_dl(struct rq *rq)
+static inline unsigned long cpu_bw_dl(struct rq *rq)
{
return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
}
+static inline unsigned long cpu_util_dl(struct rq *rq)
+{
+ return READ_ONCE(rq->avg_dl.util_avg);
+}
+
static inline unsigned long cpu_util_cfs(struct rq *rq)
{
unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
@@ -2193,4 +2207,37 @@
return util;
}
+
+static inline unsigned long cpu_util_rt(struct rq *rq)
+{
+ return READ_ONCE(rq->avg_rt.util_avg);
+}
+#endif
+
+#ifdef HAVE_SCHED_AVG_IRQ
+static inline unsigned long cpu_util_irq(struct rq *rq)
+{
+ return rq->avg_irq.util_avg;
+}
+
+static inline
+unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned long max)
+{
+ util *= (max - irq);
+ util /= max;
+
+ return util;
+
+}
+#else
+static inline unsigned long cpu_util_irq(struct rq *rq)
+{
+ return 0;
+}
+
+static inline
+unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned long max)
+{
+ return util;
+}
#endif
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index b6fb2c3..66b59ac 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,7 +32,7 @@
}
EXPORT_SYMBOL(swake_up_locked);
-void swake_up(struct swait_queue_head *q)
+void swake_up_one(struct swait_queue_head *q)
{
unsigned long flags;
@@ -40,7 +40,7 @@
swake_up_locked(q);
raw_spin_unlock_irqrestore(&q->lock, flags);
}
-EXPORT_SYMBOL(swake_up);
+EXPORT_SYMBOL(swake_up_one);
/*
* Does not allow usage from IRQ disabled, since we must be able to
@@ -69,14 +69,14 @@
}
EXPORT_SYMBOL(swake_up_all);
-void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
{
wait->task = current;
if (list_empty(&wait->task_list))
- list_add(&wait->task_list, &q->task_list);
+ list_add_tail(&wait->task_list, &q->task_list);
}
-void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state)
+void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state)
{
unsigned long flags;
@@ -85,16 +85,28 @@
set_current_state(state);
raw_spin_unlock_irqrestore(&q->lock, flags);
}
-EXPORT_SYMBOL(prepare_to_swait);
+EXPORT_SYMBOL(prepare_to_swait_exclusive);
long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state)
{
- if (signal_pending_state(state, current))
- return -ERESTARTSYS;
+ unsigned long flags;
+ long ret = 0;
- prepare_to_swait(q, wait, state);
+ raw_spin_lock_irqsave(&q->lock, flags);
+ if (unlikely(signal_pending_state(state, current))) {
+ /*
+ * See prepare_to_wait_event(). TL;DR, subsequent swake_up_one()
+ * must not see us.
+ */
+ list_del_init(&wait->task_list);
+ ret = -ERESTARTSYS;
+ } else {
+ __prepare_to_swait(q, wait);
+ set_current_state(state);
+ }
+ raw_spin_unlock_irqrestore(&q->lock, flags);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(prepare_to_swait_event);
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 928be52..870f97b 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -134,8 +134,8 @@
* @nr_exclusive: how many wake-one or wake-many threads to wake up
* @key: is directly passed to the wakeup function
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*/
void __wake_up(struct wait_queue_head *wq_head, unsigned int mode,
int nr_exclusive, void *key)
@@ -180,8 +180,8 @@
*
* On UP it can prevent extra preemption.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*/
void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode,
int nr_exclusive, void *key)
@@ -392,35 +392,36 @@
* if (condition)
* break;
*
- * p->state = mode; condition = true;
- * smp_mb(); // A smp_wmb(); // C
- * if (!wq_entry->flags & WQ_FLAG_WOKEN) wq_entry->flags |= WQ_FLAG_WOKEN;
- * schedule() try_to_wake_up();
- * p->state = TASK_RUNNING; ~~~~~~~~~~~~~~~~~~
- * wq_entry->flags &= ~WQ_FLAG_WOKEN; condition = true;
- * smp_mb() // B smp_wmb(); // C
- * wq_entry->flags |= WQ_FLAG_WOKEN;
- * }
- * remove_wait_queue(&wq_head, &wait);
+ * // in wait_woken() // in woken_wake_function()
*
+ * p->state = mode; wq_entry->flags |= WQ_FLAG_WOKEN;
+ * smp_mb(); // A try_to_wake_up():
+ * if (!(wq_entry->flags & WQ_FLAG_WOKEN)) <full barrier>
+ * schedule() if (p->state & mode)
+ * p->state = TASK_RUNNING; p->state = TASK_RUNNING;
+ * wq_entry->flags &= ~WQ_FLAG_WOKEN; ~~~~~~~~~~~~~~~~~~
+ * smp_mb(); // B condition = true;
+ * } smp_mb(); // C
+ * remove_wait_queue(&wq_head, &wait); wq_entry->flags |= WQ_FLAG_WOKEN;
*/
long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout)
{
- set_current_state(mode); /* A */
/*
- * The above implies an smp_mb(), which matches with the smp_wmb() from
- * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
- * also observe all state before the wakeup.
+ * The below executes an smp_mb(), which matches with the full barrier
+ * executed by the try_to_wake_up() in woken_wake_function() such that
+ * either we see the store to wq_entry->flags in woken_wake_function()
+ * or woken_wake_function() sees our store to current->state.
*/
+ set_current_state(mode); /* A */
if (!(wq_entry->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
timeout = schedule_timeout(timeout);
__set_current_state(TASK_RUNNING);
/*
- * The below implies an smp_mb(), it too pairs with the smp_wmb() from
- * woken_wake_function() such that we must either observe the wait
- * condition being true _OR_ WQ_FLAG_WOKEN such that we will not miss
- * an event.
+ * The below executes an smp_mb(), which matches with the smp_mb() (C)
+ * in woken_wake_function() such that either we see the wait condition
+ * being true or the store to wq_entry->flags in woken_wake_function()
+ * follows ours in the coherence order.
*/
smp_store_mb(wq_entry->flags, wq_entry->flags & ~WQ_FLAG_WOKEN); /* B */
@@ -430,14 +431,8 @@
int woken_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key)
{
- /*
- * Although this function is called under waitqueue lock, LOCK
- * doesn't imply write barrier and the users expects write
- * barrier semantics on wakeup functions. The following
- * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up()
- * and is paired with smp_store_mb() in wait_woken().
- */
- smp_wmb(); /* C */
+ /* Pairs with the smp_store_mb() in wait_woken(). */
+ smp_mb(); /* C */
wq_entry->flags |= WQ_FLAG_WOKEN;
return default_wake_function(wq_entry, mode, sync, key);
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 5043e74..c230c2d 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -238,8 +238,7 @@
mutex_lock(&smpboot_threads_lock);
list_for_each_entry(cur, &hotplug_threads, list)
- if (cpumask_test_cpu(cpu, cur->cpumask))
- smpboot_unpark_thread(cur, cpu);
+ smpboot_unpark_thread(cur, cpu);
mutex_unlock(&smpboot_threads_lock);
return 0;
}
@@ -280,34 +279,26 @@
}
/**
- * smpboot_register_percpu_thread_cpumask - Register a per_cpu thread related
+ * smpboot_register_percpu_thread - Register a per_cpu thread related
* to hotplug
* @plug_thread: Hotplug thread descriptor
- * @cpumask: The cpumask where threads run
*
* Creates and starts the threads on all online cpus.
*/
-int smpboot_register_percpu_thread_cpumask(struct smp_hotplug_thread *plug_thread,
- const struct cpumask *cpumask)
+int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
{
unsigned int cpu;
int ret = 0;
- if (!alloc_cpumask_var(&plug_thread->cpumask, GFP_KERNEL))
- return -ENOMEM;
- cpumask_copy(plug_thread->cpumask, cpumask);
-
get_online_cpus();
mutex_lock(&smpboot_threads_lock);
for_each_online_cpu(cpu) {
ret = __smpboot_create_thread(plug_thread, cpu);
if (ret) {
smpboot_destroy_threads(plug_thread);
- free_cpumask_var(plug_thread->cpumask);
goto out;
}
- if (cpumask_test_cpu(cpu, cpumask))
- smpboot_unpark_thread(plug_thread, cpu);
+ smpboot_unpark_thread(plug_thread, cpu);
}
list_add(&plug_thread->list, &hotplug_threads);
out:
@@ -315,7 +306,7 @@
put_online_cpus();
return ret;
}
-EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread_cpumask);
+EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
/**
* smpboot_unregister_percpu_thread - Unregister a per_cpu thread related to hotplug
@@ -331,44 +322,9 @@
smpboot_destroy_threads(plug_thread);
mutex_unlock(&smpboot_threads_lock);
put_online_cpus();
- free_cpumask_var(plug_thread->cpumask);
}
EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
-/**
- * smpboot_update_cpumask_percpu_thread - Adjust which per_cpu hotplug threads stay parked
- * @plug_thread: Hotplug thread descriptor
- * @new: Revised mask to use
- *
- * The cpumask field in the smp_hotplug_thread must not be updated directly
- * by the client, but only by calling this function.
- * This function can only be called on a registered smp_hotplug_thread.
- */
-void smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
- const struct cpumask *new)
-{
- struct cpumask *old = plug_thread->cpumask;
- static struct cpumask tmp;
- unsigned int cpu;
-
- lockdep_assert_cpus_held();
- mutex_lock(&smpboot_threads_lock);
-
- /* Park threads that were exclusively enabled on the old mask. */
- cpumask_andnot(&tmp, old, new);
- for_each_cpu_and(cpu, &tmp, cpu_online_mask)
- smpboot_park_thread(plug_thread, cpu);
-
- /* Unpark threads that are exclusively enabled on the new mask. */
- cpumask_andnot(&tmp, new, old);
- for_each_cpu_and(cpu, &tmp, cpu_online_mask)
- smpboot_unpark_thread(plug_thread, cpu);
-
- cpumask_copy(old, new);
-
- mutex_unlock(&smpboot_threads_lock);
-}
-
static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
/*
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1e..067cb83 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -81,6 +81,7 @@
unsigned long flags;
bool enabled;
+ preempt_disable();
raw_spin_lock_irqsave(&stopper->lock, flags);
enabled = stopper->enabled;
if (enabled)
@@ -90,6 +91,7 @@
raw_spin_unlock_irqrestore(&stopper->lock, flags);
wake_up_q(&wakeq);
+ preempt_enable();
return enabled;
}
@@ -236,13 +238,24 @@
struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
DEFINE_WAKE_Q(wakeq);
int err;
+
retry:
+ /*
+ * The waking up of stopper threads has to happen in the same
+ * scheduling context as the queueing. Otherwise, there is a
+ * possibility of one of the above stoppers being woken up by another
+ * CPU, and preempting us. This will cause us to not wake up the other
+ * stopper forever.
+ */
+ preempt_disable();
raw_spin_lock_irq(&stopper1->lock);
raw_spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
- err = -ENOENT;
- if (!stopper1->enabled || !stopper2->enabled)
+ if (!stopper1->enabled || !stopper2->enabled) {
+ err = -ENOENT;
goto unlock;
+ }
+
/*
* Ensure that if we race with __stop_cpus() the stoppers won't get
* queued up in reverse order leading to system deadlock.
@@ -253,36 +266,30 @@
* It can be falsely true but it is safe to spin until it is cleared,
* queue_stop_cpus_work() does everything under preempt_disable().
*/
- err = -EDEADLK;
- if (unlikely(stop_cpus_in_progress))
- goto unlock;
+ if (unlikely(stop_cpus_in_progress)) {
+ err = -EDEADLK;
+ goto unlock;
+ }
err = 0;
__cpu_stop_queue_work(stopper1, work1, &wakeq);
__cpu_stop_queue_work(stopper2, work2, &wakeq);
- /*
- * The waking up of stopper threads has to happen
- * in the same scheduling context as the queueing.
- * Otherwise, there is a possibility of one of the
- * above stoppers being woken up by another CPU,
- * and preempting us. This will cause us to n ot
- * wake up the other stopper forever.
- */
- preempt_disable();
+
unlock:
raw_spin_unlock(&stopper2->lock);
raw_spin_unlock_irq(&stopper1->lock);
if (unlikely(err == -EDEADLK)) {
+ preempt_enable();
+
while (stop_cpus_in_progress)
cpu_relax();
+
goto retry;
}
- if (!err) {
- wake_up_q(&wakeq);
- preempt_enable();
- }
+ wake_up_q(&wakeq);
+ preempt_enable();
return err;
}
diff --git a/kernel/sys.c b/kernel/sys.c
index 38509dc..e27b51d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2512,11 +2512,11 @@
{
unsigned long mem_total, sav_total;
unsigned int mem_unit, bitcount;
- struct timespec tp;
+ struct timespec64 tp;
memset(info, 0, sizeof(struct sysinfo));
- get_monotonic_boottime(&tp);
+ ktime_get_boottime_ts64(&tp);
info->uptime = tp.tv_sec + (tp.tv_nsec ? 1 : 0);
get_avenrun(info->loads, 0, SI_LOAD_SHIFT - FSHIFT);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2d9837c..f22f76b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -368,14 +368,6 @@
.mode = 0644,
.proc_handler = proc_dointvec,
},
- {
- .procname = "sched_time_avg_ms",
- .data = &sysctl_sched_time_avg,
- .maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &one,
- },
#ifdef CONFIG_SCHEDSTATS
{
.procname = "sched_schedstats",
diff --git a/kernel/test_kprobes.c b/kernel/test_kprobes.c
index dd53e35..7bca480 100644
--- a/kernel/test_kprobes.c
+++ b/kernel/test_kprobes.c
@@ -162,90 +162,6 @@
}
-#if 0
-static u32 jph_val;
-
-static u32 j_kprobe_target(u32 value)
-{
- if (preemptible()) {
- handler_errors++;
- pr_err("jprobe-handler is preemptible\n");
- }
- if (value != rand1) {
- handler_errors++;
- pr_err("incorrect value in jprobe handler\n");
- }
-
- jph_val = rand1;
- jprobe_return();
- return 0;
-}
-
-static struct jprobe jp = {
- .entry = j_kprobe_target,
- .kp.symbol_name = "kprobe_target"
-};
-
-static int test_jprobe(void)
-{
- int ret;
-
- ret = register_jprobe(&jp);
- if (ret < 0) {
- pr_err("register_jprobe returned %d\n", ret);
- return ret;
- }
-
- ret = target(rand1);
- unregister_jprobe(&jp);
- if (jph_val == 0) {
- pr_err("jprobe handler not called\n");
- handler_errors++;
- }
-
- return 0;
-}
-
-static struct jprobe jp2 = {
- .entry = j_kprobe_target,
- .kp.symbol_name = "kprobe_target2"
-};
-
-static int test_jprobes(void)
-{
- int ret;
- struct jprobe *jps[2] = {&jp, &jp2};
-
- /* addr and flags should be cleard for reusing kprobe. */
- jp.kp.addr = NULL;
- jp.kp.flags = 0;
- ret = register_jprobes(jps, 2);
- if (ret < 0) {
- pr_err("register_jprobes returned %d\n", ret);
- return ret;
- }
-
- jph_val = 0;
- ret = target(rand1);
- if (jph_val == 0) {
- pr_err("jprobe handler not called\n");
- handler_errors++;
- }
-
- jph_val = 0;
- ret = target2(rand1);
- if (jph_val == 0) {
- pr_err("jprobe handler2 not called\n");
- handler_errors++;
- }
- unregister_jprobes(jps, 2);
-
- return 0;
-}
-#else
-#define test_jprobe() (0)
-#define test_jprobes() (0)
-#endif
#ifdef CONFIG_KRETPROBES
static u32 krph_val;
@@ -383,16 +299,6 @@
if (ret < 0)
errors++;
- num_tests++;
- ret = test_jprobe();
- if (ret < 0)
- errors++;
-
- num_tests++;
- ret = test_jprobes();
- if (ret < 0)
- errors++;
-
#ifdef CONFIG_KRETPROBES
num_tests++;
ret = test_kretprobe();
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 639321b..fa5de5e 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -581,11 +581,11 @@
* @timr: Pointer to the posixtimer data struct
* @now: Current time to forward the timer against
*/
-static int alarm_timer_forward(struct k_itimer *timr, ktime_t now)
+static s64 alarm_timer_forward(struct k_itimer *timr, ktime_t now)
{
struct alarm *alarm = &timr->it.alarm.alarmtimer;
- return (int) alarm_forward(alarm, timr->it_interval, now);
+ return alarm_forward(alarm, timr->it_interval, now);
}
/**
@@ -808,7 +808,8 @@
/* Convert (if necessary) to absolute time */
if (flags != TIMER_ABSTIME) {
ktime_t now = alarm_bases[type].gettime();
- exp = ktime_add(now, exp);
+
+ exp = ktime_add_safe(now, exp);
}
ret = alarmtimer_do_nsleep(&alarm, exp, type);
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 16c027e..8c0e409 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -463,6 +463,12 @@
dev->cpumask = cpumask_of(smp_processor_id());
}
+ if (dev->cpumask == cpu_all_mask) {
+ WARN(1, "%s cpumask == cpu_all_mask, using cpu_possible_mask instead\n",
+ dev->name);
+ dev->cpumask = cpu_possible_mask;
+ }
+
raw_spin_lock_irqsave(&clockevents_lock, flags);
list_add(&dev->list, &clockevent_devices);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index f89a78e..f74fb00 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -94,6 +94,8 @@
/*[Clocksource internal variables]---------
* curr_clocksource:
* currently selected clocksource.
+ * suspend_clocksource:
+ * used to calculate the suspend time.
* clocksource_list:
* linked list with the registered clocksources
* clocksource_mutex:
@@ -102,10 +104,12 @@
* Name of the user-specified clocksource.
*/
static struct clocksource *curr_clocksource;
+static struct clocksource *suspend_clocksource;
static LIST_HEAD(clocksource_list);
static DEFINE_MUTEX(clocksource_mutex);
static char override_name[CS_NAME_LEN];
static int finished_booting;
+static u64 suspend_start;
#ifdef CONFIG_CLOCKSOURCE_WATCHDOG
static void clocksource_watchdog_work(struct work_struct *work);
@@ -447,6 +451,140 @@
#endif /* CONFIG_CLOCKSOURCE_WATCHDOG */
+static bool clocksource_is_suspend(struct clocksource *cs)
+{
+ return cs == suspend_clocksource;
+}
+
+static void __clocksource_suspend_select(struct clocksource *cs)
+{
+ /*
+ * Skip the clocksource which will be stopped in suspend state.
+ */
+ if (!(cs->flags & CLOCK_SOURCE_SUSPEND_NONSTOP))
+ return;
+
+ /*
+ * The nonstop clocksource can be selected as the suspend clocksource to
+ * calculate the suspend time, so it should not supply suspend/resume
+ * interfaces to suspend the nonstop clocksource when system suspends.
+ */
+ if (cs->suspend || cs->resume) {
+ pr_warn("Nonstop clocksource %s should not supply suspend/resume interfaces\n",
+ cs->name);
+ }
+
+ /* Pick the best rating. */
+ if (!suspend_clocksource || cs->rating > suspend_clocksource->rating)
+ suspend_clocksource = cs;
+}
+
+/**
+ * clocksource_suspend_select - Select the best clocksource for suspend timing
+ * @fallback: if select a fallback clocksource
+ */
+static void clocksource_suspend_select(bool fallback)
+{
+ struct clocksource *cs, *old_suspend;
+
+ old_suspend = suspend_clocksource;
+ if (fallback)
+ suspend_clocksource = NULL;
+
+ list_for_each_entry(cs, &clocksource_list, list) {
+ /* Skip current if we were requested for a fallback. */
+ if (fallback && cs == old_suspend)
+ continue;
+
+ __clocksource_suspend_select(cs);
+ }
+}
+
+/**
+ * clocksource_start_suspend_timing - Start measuring the suspend timing
+ * @cs: current clocksource from timekeeping
+ * @start_cycles: current cycles from timekeeping
+ *
+ * This function will save the start cycle values of suspend timer to calculate
+ * the suspend time when resuming system.
+ *
+ * This function is called late in the suspend process from timekeeping_suspend(),
+ * that means processes are freezed, non-boot cpus and interrupts are disabled
+ * now. It is therefore possible to start the suspend timer without taking the
+ * clocksource mutex.
+ */
+void clocksource_start_suspend_timing(struct clocksource *cs, u64 start_cycles)
+{
+ if (!suspend_clocksource)
+ return;
+
+ /*
+ * If current clocksource is the suspend timer, we should use the
+ * tkr_mono.cycle_last value as suspend_start to avoid same reading
+ * from suspend timer.
+ */
+ if (clocksource_is_suspend(cs)) {
+ suspend_start = start_cycles;
+ return;
+ }
+
+ if (suspend_clocksource->enable &&
+ suspend_clocksource->enable(suspend_clocksource)) {
+ pr_warn_once("Failed to enable the non-suspend-able clocksource.\n");
+ return;
+ }
+
+ suspend_start = suspend_clocksource->read(suspend_clocksource);
+}
+
+/**
+ * clocksource_stop_suspend_timing - Stop measuring the suspend timing
+ * @cs: current clocksource from timekeeping
+ * @cycle_now: current cycles from timekeeping
+ *
+ * This function will calculate the suspend time from suspend timer.
+ *
+ * Returns nanoseconds since suspend started, 0 if no usable suspend clocksource.
+ *
+ * This function is called early in the resume process from timekeeping_resume(),
+ * that means there is only one cpu, no processes are running and the interrupts
+ * are disabled. It is therefore possible to stop the suspend timer without
+ * taking the clocksource mutex.
+ */
+u64 clocksource_stop_suspend_timing(struct clocksource *cs, u64 cycle_now)
+{
+ u64 now, delta, nsec = 0;
+
+ if (!suspend_clocksource)
+ return 0;
+
+ /*
+ * If current clocksource is the suspend timer, we should use the
+ * tkr_mono.cycle_last value from timekeeping as current cycle to
+ * avoid same reading from suspend timer.
+ */
+ if (clocksource_is_suspend(cs))
+ now = cycle_now;
+ else
+ now = suspend_clocksource->read(suspend_clocksource);
+
+ if (now > suspend_start) {
+ delta = clocksource_delta(now, suspend_start,
+ suspend_clocksource->mask);
+ nsec = mul_u64_u32_shr(delta, suspend_clocksource->mult,
+ suspend_clocksource->shift);
+ }
+
+ /*
+ * Disable the suspend timer to save power if current clocksource is
+ * not the suspend timer.
+ */
+ if (!clocksource_is_suspend(cs) && suspend_clocksource->disable)
+ suspend_clocksource->disable(suspend_clocksource);
+
+ return nsec;
+}
+
/**
* clocksource_suspend - suspend the clocksource(s)
*/
@@ -792,6 +930,7 @@
clocksource_select();
clocksource_select_watchdog(false);
+ __clocksource_suspend_select(cs);
mutex_unlock(&clocksource_mutex);
return 0;
}
@@ -820,6 +959,7 @@
clocksource_select();
clocksource_select_watchdog(false);
+ clocksource_suspend_select(false);
mutex_unlock(&clocksource_mutex);
}
EXPORT_SYMBOL(clocksource_change_rating);
@@ -845,6 +985,15 @@
return -EBUSY;
}
+ if (clocksource_is_suspend(cs)) {
+ /*
+ * Select and try to install a replacement suspend clocksource.
+ * If no replacement suspend clocksource, we will just let the
+ * clocksource go and have no suspend clocksource.
+ */
+ clocksource_suspend_select(true);
+ }
+
clocksource_watchdog_lock(&flags);
clocksource_dequeue_watchdog(cs);
list_del_init(&cs->list);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 3e93c54..e1a549c 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -718,8 +718,8 @@
struct hrtimer_cpu_base *base = this_cpu_ptr(&hrtimer_bases);
if (tick_init_highres()) {
- printk(KERN_WARNING "Could not switch to high resolution "
- "mode on CPU %d\n", base->cpu);
+ pr_warn("Could not switch to high resolution mode on CPU %u\n",
+ base->cpu);
return;
}
base->hres_active = 1;
@@ -1573,8 +1573,7 @@
else
expires_next = ktime_add(now, delta);
tick_program_event(expires_next, 1);
- printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
- ktime_to_ns(delta));
+ pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta));
}
/* called with interrupts disabled */
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index a09ded7..c5e0cba 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -502,7 +502,7 @@
{
struct timespec64 next;
- getnstimeofday64(&next);
+ ktime_get_real_ts64(&next);
if (!fail)
next.tv_sec = 659;
else {
@@ -537,7 +537,7 @@
if (!IS_ENABLED(CONFIG_RTC_SYSTOHC))
return;
- getnstimeofday64(&now);
+ ktime_get_real_ts64(&now);
adjust = now;
if (persistent_clock_is_local)
@@ -591,7 +591,7 @@
* Architectures are strongly encouraged to use rtclib and not
* implement this legacy API.
*/
- getnstimeofday64(&now);
+ ktime_get_real_ts64(&now);
if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
if (persistent_clock_is_local)
adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
@@ -642,7 +642,7 @@
/*
* Propagate a new txc->status value into the NTP state:
*/
-static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
+static inline void process_adj_status(const struct timex *txc)
{
if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
time_state = TIME_OK;
@@ -665,12 +665,10 @@
}
-static inline void process_adjtimex_modes(struct timex *txc,
- struct timespec64 *ts,
- s32 *time_tai)
+static inline void process_adjtimex_modes(const struct timex *txc, s32 *time_tai)
{
if (txc->modes & ADJ_STATUS)
- process_adj_status(txc, ts);
+ process_adj_status(txc);
if (txc->modes & ADJ_NANO)
time_status |= STA_NANO;
@@ -718,7 +716,7 @@
* adjtimex mainly allows reading (and writing, if superuser) of
* kernel time-keeping variables. used by xntpd.
*/
-int __do_adjtimex(struct timex *txc, struct timespec64 *ts, s32 *time_tai)
+int __do_adjtimex(struct timex *txc, const struct timespec64 *ts, s32 *time_tai)
{
int result;
@@ -735,7 +733,7 @@
/* If there are input parameters, then process them: */
if (txc->modes)
- process_adjtimex_modes(txc, ts, time_tai);
+ process_adjtimex_modes(txc, time_tai);
txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ,
NTP_SCALE_SHIFT);
@@ -1022,12 +1020,11 @@
static int __init ntp_tick_adj_setup(char *str)
{
- int rc = kstrtol(str, 0, (long *)&ntp_tick_adj);
-
+ int rc = kstrtos64(str, 0, &ntp_tick_adj);
if (rc)
return rc;
- ntp_tick_adj <<= NTP_SCALE_SHIFT;
+ ntp_tick_adj <<= NTP_SCALE_SHIFT;
return 1;
}
diff --git a/kernel/time/ntp_internal.h b/kernel/time/ntp_internal.h
index 909bd1f..c24b0e1 100644
--- a/kernel/time/ntp_internal.h
+++ b/kernel/time/ntp_internal.h
@@ -8,6 +8,6 @@
extern u64 ntp_tick_length(void);
extern ktime_t ntp_get_next_leap(void);
extern int second_overflow(time64_t secs);
-extern int __do_adjtimex(struct timex *, struct timespec64 *, s32 *);
-extern void __hardpps(const struct timespec64 *, const struct timespec64 *);
+extern int __do_adjtimex(struct timex *txc, const struct timespec64 *ts, s32 *time_tai);
+extern void __hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts);
#endif /* _LINUX_NTP_INTERNAL_H */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 9cdf54b..294d7b6 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -85,7 +85,7 @@
continue;
timer->it.cpu.expires += incr;
- timer->it_overrun += 1 << i;
+ timer->it_overrun += 1LL << i;
delta -= incr;
}
}
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index 26aa956..2c6847d 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -81,7 +81,7 @@
ktime_get_ts64(tp);
break;
case CLOCK_BOOTTIME:
- get_monotonic_boottime64(tp);
+ ktime_get_boottime_ts64(tp);
break;
default:
return -EINVAL;
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 66321cb..f23cc46 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -219,21 +219,21 @@
*/
static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec64 *tp)
{
- getrawmonotonic64(tp);
+ ktime_get_raw_ts64(tp);
return 0;
}
static int posix_get_realtime_coarse(clockid_t which_clock, struct timespec64 *tp)
{
- *tp = current_kernel_time64();
+ ktime_get_coarse_real_ts64(tp);
return 0;
}
static int posix_get_monotonic_coarse(clockid_t which_clock,
struct timespec64 *tp)
{
- *tp = get_monotonic_coarse64();
+ ktime_get_coarse_ts64(tp);
return 0;
}
@@ -245,13 +245,13 @@
static int posix_get_boottime(const clockid_t which_clock, struct timespec64 *tp)
{
- get_monotonic_boottime64(tp);
+ ktime_get_boottime_ts64(tp);
return 0;
}
static int posix_get_tai(clockid_t which_clock, struct timespec64 *tp)
{
- timekeeping_clocktai64(tp);
+ ktime_get_clocktai_ts64(tp);
return 0;
}
@@ -274,6 +274,17 @@
}
__initcall(init_posix_timers);
+/*
+ * The siginfo si_overrun field and the return value of timer_getoverrun(2)
+ * are of type int. Clamp the overrun value to INT_MAX
+ */
+static inline int timer_overrun_to_int(struct k_itimer *timr, int baseval)
+{
+ s64 sum = timr->it_overrun_last + (s64)baseval;
+
+ return sum > (s64)INT_MAX ? INT_MAX : (int)sum;
+}
+
static void common_hrtimer_rearm(struct k_itimer *timr)
{
struct hrtimer *timer = &timr->it.real.timer;
@@ -281,9 +292,8 @@
if (!timr->it_interval)
return;
- timr->it_overrun += (unsigned int) hrtimer_forward(timer,
- timer->base->get_time(),
- timr->it_interval);
+ timr->it_overrun += hrtimer_forward(timer, timer->base->get_time(),
+ timr->it_interval);
hrtimer_restart(timer);
}
@@ -312,10 +322,10 @@
timr->it_active = 1;
timr->it_overrun_last = timr->it_overrun;
- timr->it_overrun = -1;
+ timr->it_overrun = -1LL;
++timr->it_requeue_pending;
- info->si_overrun += timr->it_overrun_last;
+ info->si_overrun = timer_overrun_to_int(timr, info->si_overrun);
}
unlock_timer(timr, flags);
@@ -409,9 +419,8 @@
now = ktime_add(now, kj);
}
#endif
- timr->it_overrun += (unsigned int)
- hrtimer_forward(timer, now,
- timr->it_interval);
+ timr->it_overrun += hrtimer_forward(timer, now,
+ timr->it_interval);
ret = HRTIMER_RESTART;
++timr->it_requeue_pending;
timr->it_active = 1;
@@ -515,7 +524,7 @@
new_timer->it_id = (timer_t) new_timer_id;
new_timer->it_clock = which_clock;
new_timer->kclock = kc;
- new_timer->it_overrun = -1;
+ new_timer->it_overrun = -1LL;
if (event) {
rcu_read_lock();
@@ -636,11 +645,11 @@
return __hrtimer_expires_remaining_adjusted(timer, now);
}
-static int common_hrtimer_forward(struct k_itimer *timr, ktime_t now)
+static s64 common_hrtimer_forward(struct k_itimer *timr, ktime_t now)
{
struct hrtimer *timer = &timr->it.real.timer;
- return (int)hrtimer_forward(timer, now, timr->it_interval);
+ return hrtimer_forward(timer, now, timr->it_interval);
}
/*
@@ -734,7 +743,7 @@
/* Get the time remaining on a POSIX.1b interval timer. */
SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
- struct itimerspec __user *, setting)
+ struct __kernel_itimerspec __user *, setting)
{
struct itimerspec64 cur_setting;
@@ -746,7 +755,8 @@
return ret;
}
-#ifdef CONFIG_COMPAT
+#ifdef CONFIG_COMPAT_32BIT_TIME
+
COMPAT_SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
struct compat_itimerspec __user *, setting)
{
@@ -759,6 +769,7 @@
}
return ret;
}
+
#endif
/*
@@ -780,7 +791,7 @@
if (!timr)
return -EINVAL;
- overrun = timr->it_overrun_last;
+ overrun = timer_overrun_to_int(timr, 0);
unlock_timer(timr, flags);
return overrun;
@@ -897,8 +908,8 @@
/* Set a POSIX.1b interval timer */
SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, flags,
- const struct itimerspec __user *, new_setting,
- struct itimerspec __user *, old_setting)
+ const struct __kernel_itimerspec __user *, new_setting,
+ struct __kernel_itimerspec __user *, old_setting)
{
struct itimerspec64 new_spec, old_spec;
struct itimerspec64 *rtn = old_setting ? &old_spec : NULL;
@@ -918,7 +929,7 @@
return error;
}
-#ifdef CONFIG_COMPAT
+#ifdef CONFIG_COMPAT_32BIT_TIME
COMPAT_SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, flags,
struct compat_itimerspec __user *, new,
struct compat_itimerspec __user *, old)
diff --git a/kernel/time/posix-timers.h b/kernel/time/posix-timers.h
index 151e28f..ddb2114 100644
--- a/kernel/time/posix-timers.h
+++ b/kernel/time/posix-timers.h
@@ -19,7 +19,7 @@
void (*timer_get)(struct k_itimer *timr,
struct itimerspec64 *cur_setting);
void (*timer_rearm)(struct k_itimer *timr);
- int (*timer_forward)(struct k_itimer *timr, ktime_t now);
+ s64 (*timer_forward)(struct k_itimer *timr, ktime_t now);
ktime_t (*timer_remaining)(struct k_itimer *timr, ktime_t now);
int (*timer_try_to_cancel)(struct k_itimer *timr);
void (*timer_arm)(struct k_itimer *timr, ktime_t expires,
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 2d8f05a..cbc72c2 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -237,7 +237,7 @@
pr_debug("Registered %pF as sched_clock source\n", read);
}
-void __init sched_clock_postinit(void)
+void __init generic_sched_clock_init(void)
{
/*
* If no sched_clock() function has been provided at that point,
diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index 58045eb..a59641f 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -90,7 +90,7 @@
.max_delta_ticks = ULONG_MAX,
.mult = 1,
.shift = 0,
- .cpumask = cpu_all_mask,
+ .cpumask = cpu_possible_mask,
};
static enum hrtimer_restart bc_handler(struct hrtimer *t)
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 2b41e8e..ccdb351 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -64,7 +64,7 @@
*/
SYSCALL_DEFINE1(time, time_t __user *, tloc)
{
- time_t i = get_seconds();
+ time_t i = (time_t)ktime_get_real_seconds();
if (tloc) {
if (put_user(i,tloc))
@@ -107,11 +107,9 @@
/* compat_time_t is a 32 bit "long" and needs to get converted. */
COMPAT_SYSCALL_DEFINE1(time, compat_time_t __user *, tloc)
{
- struct timeval tv;
compat_time_t i;
- do_gettimeofday(&tv);
- i = tv.tv_sec;
+ i = (compat_time_t)ktime_get_real_seconds();
if (tloc) {
if (put_user(i,tloc))
@@ -931,7 +929,7 @@
EXPORT_SYMBOL_GPL(compat_put_timespec64);
int get_itimerspec64(struct itimerspec64 *it,
- const struct itimerspec __user *uit)
+ const struct __kernel_itimerspec __user *uit)
{
int ret;
@@ -946,7 +944,7 @@
EXPORT_SYMBOL_GPL(get_itimerspec64);
int put_itimerspec64(const struct itimerspec64 *it,
- struct itimerspec __user *uit)
+ struct __kernel_itimerspec __user *uit)
{
int ret;
@@ -959,3 +957,24 @@
return ret;
}
EXPORT_SYMBOL_GPL(put_itimerspec64);
+
+int get_compat_itimerspec64(struct itimerspec64 *its,
+ const struct compat_itimerspec __user *uits)
+{
+
+ if (__compat_get_timespec64(&its->it_interval, &uits->it_interval) ||
+ __compat_get_timespec64(&its->it_value, &uits->it_value))
+ return -EFAULT;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(get_compat_itimerspec64);
+
+int put_compat_itimerspec64(const struct itimerspec64 *its,
+ struct compat_itimerspec __user *uits)
+{
+ if (__compat_put_timespec64(&its->it_interval, &uits->it_interval) ||
+ __compat_put_timespec64(&its->it_value, &uits->it_value))
+ return -EFAULT;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(put_compat_itimerspec64);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4786df9..f3b22f4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -17,6 +17,7 @@
#include <linux/nmi.h>
#include <linux/sched.h>
#include <linux/sched/loadavg.h>
+#include <linux/sched/clock.h>
#include <linux/syscore_ops.h>
#include <linux/clocksource.h>
#include <linux/jiffies.h>
@@ -34,6 +35,14 @@
#define TK_MIRROR (1 << 1)
#define TK_CLOCK_WAS_SET (1 << 2)
+enum timekeeping_adv_mode {
+ /* Update timekeeper when a tick has passed */
+ TK_ADV_TICK,
+
+ /* Update timekeeper on a direct frequency change */
+ TK_ADV_FREQ
+};
+
/*
* The most important data for readout fits into a single 64 byte
* cache line.
@@ -97,7 +106,7 @@
}
}
-static inline struct timespec64 tk_xtime(struct timekeeper *tk)
+static inline struct timespec64 tk_xtime(const struct timekeeper *tk)
{
struct timespec64 ts;
@@ -154,7 +163,7 @@
* a read of the fast-timekeeper tkrs (which is protected by its own locking
* and update logic).
*/
-static inline u64 tk_clock_read(struct tk_read_base *tkr)
+static inline u64 tk_clock_read(const struct tk_read_base *tkr)
{
struct clocksource *clock = READ_ONCE(tkr->clock);
@@ -203,7 +212,7 @@
}
}
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
{
struct timekeeper *tk = &tk_core.timekeeper;
u64 now, last, mask, max, delta;
@@ -247,7 +256,7 @@
static inline void timekeeping_check_update(struct timekeeper *tk, u64 offset)
{
}
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(const struct tk_read_base *tkr)
{
u64 cycle_now, delta;
@@ -344,7 +353,7 @@
static inline u32 arch_gettimeoffset(void) { return 0; }
#endif
-static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 delta)
+static inline u64 timekeeping_delta_to_ns(const struct tk_read_base *tkr, u64 delta)
{
u64 nsec;
@@ -355,7 +364,7 @@
return nsec + arch_gettimeoffset();
}
-static inline u64 timekeeping_get_ns(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_ns(const struct tk_read_base *tkr)
{
u64 delta;
@@ -363,7 +372,7 @@
return timekeeping_delta_to_ns(tkr, delta);
}
-static inline u64 timekeeping_cycles_to_ns(struct tk_read_base *tkr, u64 cycles)
+static inline u64 timekeeping_cycles_to_ns(const struct tk_read_base *tkr, u64 cycles)
{
u64 delta;
@@ -386,7 +395,8 @@
* slightly wrong timestamp (a few nanoseconds). See
* @ktime_get_mono_fast_ns.
*/
-static void update_fast_timekeeper(struct tk_read_base *tkr, struct tk_fast *tkf)
+static void update_fast_timekeeper(const struct tk_read_base *tkr,
+ struct tk_fast *tkf)
{
struct tk_read_base *base = tkf->base;
@@ -541,10 +551,10 @@
* number of cycles every time until timekeeping is resumed at which time the
* proper readout base for the fast timekeeper will be restored automatically.
*/
-static void halt_fast_timekeeper(struct timekeeper *tk)
+static void halt_fast_timekeeper(const struct timekeeper *tk)
{
static struct tk_read_base tkr_dummy;
- struct tk_read_base *tkr = &tk->tkr_mono;
+ const struct tk_read_base *tkr = &tk->tkr_mono;
memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy));
cycles_at_suspend = tk_clock_read(tkr);
@@ -1269,7 +1279,7 @@
*
* Adds or subtracts an offset value from the current time.
*/
-static int timekeeping_inject_offset(struct timespec64 *ts)
+static int timekeeping_inject_offset(const struct timespec64 *ts)
{
struct timekeeper *tk = &tk_core.timekeeper;
unsigned long flags;
@@ -1496,22 +1506,39 @@
}
/**
- * read_boot_clock64 - Return time of the system start.
+ * read_persistent_wall_and_boot_offset - Read persistent clock, and also offset
+ * from the boot.
*
* Weak dummy function for arches that do not yet support it.
- * Function to read the exact time the system has been started.
- * Returns a timespec64 with tv_sec=0 and tv_nsec=0 if unsupported.
- *
- * XXX - Do be sure to remove it once all arches implement it.
+ * wall_time - current time as returned by persistent clock
+ * boot_offset - offset that is defined as wall_time - boot_time
+ * The default function calculates offset based on the current value of
+ * local_clock(). This way architectures that support sched_clock() but don't
+ * support dedicated boot time clock will provide the best estimate of the
+ * boot time.
*/
-void __weak read_boot_clock64(struct timespec64 *ts)
+void __weak __init
+read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
+ struct timespec64 *boot_offset)
{
- ts->tv_sec = 0;
- ts->tv_nsec = 0;
+ read_persistent_clock64(wall_time);
+ *boot_offset = ns_to_timespec64(local_clock());
}
-/* Flag for if timekeeping_resume() has injected sleeptime */
-static bool sleeptime_injected;
+/*
+ * Flag reflecting whether timekeeping_resume() has injected sleeptime.
+ *
+ * The flag starts of false and is only set when a suspend reaches
+ * timekeeping_suspend(), timekeeping_resume() sets it to false when the
+ * timekeeper clocksource is not stopping across suspend and has been
+ * used to update sleep time. If the timekeeper clocksource has stopped
+ * then the flag stays true and is used by the RTC resume code to decide
+ * whether sleeptime must be injected and if so the flag gets false then.
+ *
+ * If a suspend fails before reaching timekeeping_resume() then the flag
+ * stays false and prevents erroneous sleeptime injection.
+ */
+static bool suspend_timing_needed;
/* Flag for if there is a persistent clock on this platform */
static bool persistent_clock_exists;
@@ -1521,28 +1548,29 @@
*/
void __init timekeeping_init(void)
{
+ struct timespec64 wall_time, boot_offset, wall_to_mono;
struct timekeeper *tk = &tk_core.timekeeper;
struct clocksource *clock;
unsigned long flags;
- struct timespec64 now, boot, tmp;
- read_persistent_clock64(&now);
- if (!timespec64_valid_strict(&now)) {
- pr_warn("WARNING: Persistent clock returned invalid value!\n"
- " Check your CMOS/BIOS settings.\n");
- now.tv_sec = 0;
- now.tv_nsec = 0;
- } else if (now.tv_sec || now.tv_nsec)
+ read_persistent_wall_and_boot_offset(&wall_time, &boot_offset);
+ if (timespec64_valid_strict(&wall_time) &&
+ timespec64_to_ns(&wall_time) > 0) {
persistent_clock_exists = true;
-
- read_boot_clock64(&boot);
- if (!timespec64_valid_strict(&boot)) {
- pr_warn("WARNING: Boot clock returned invalid value!\n"
- " Check your CMOS/BIOS settings.\n");
- boot.tv_sec = 0;
- boot.tv_nsec = 0;
+ } else if (timespec64_to_ns(&wall_time) != 0) {
+ pr_warn("Persistent clock returned invalid value");
+ wall_time = (struct timespec64){0};
}
+ if (timespec64_compare(&wall_time, &boot_offset) < 0)
+ boot_offset = (struct timespec64){0};
+
+ /*
+ * We want set wall_to_mono, so the following is true:
+ * wall time + wall_to_mono = boot time
+ */
+ wall_to_mono = timespec64_sub(boot_offset, wall_time);
+
raw_spin_lock_irqsave(&timekeeper_lock, flags);
write_seqcount_begin(&tk_core.seq);
ntp_init();
@@ -1552,13 +1580,10 @@
clock->enable(clock);
tk_setup_internals(tk, clock);
- tk_set_xtime(tk, &now);
+ tk_set_xtime(tk, &wall_time);
tk->raw_sec = 0;
- if (boot.tv_sec == 0 && boot.tv_nsec == 0)
- boot = tk_xtime(tk);
- set_normalized_timespec64(&tmp, -boot.tv_sec, -boot.tv_nsec);
- tk_set_wall_to_mono(tk, tmp);
+ tk_set_wall_to_mono(tk, wall_to_mono);
timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
@@ -1577,7 +1602,7 @@
* adds the sleep offset to the timekeeping variables.
*/
static void __timekeeping_inject_sleeptime(struct timekeeper *tk,
- struct timespec64 *delta)
+ const struct timespec64 *delta)
{
if (!timespec64_valid_strict(delta)) {
printk_deferred(KERN_WARNING
@@ -1610,7 +1635,7 @@
*/
bool timekeeping_rtc_skipresume(void)
{
- return sleeptime_injected;
+ return !suspend_timing_needed;
}
/**
@@ -1638,7 +1663,7 @@
* This function should only be called by rtc_resume(), and allows
* a suspend offset to be injected into the timekeeping values.
*/
-void timekeeping_inject_sleeptime64(struct timespec64 *delta)
+void timekeeping_inject_sleeptime64(const struct timespec64 *delta)
{
struct timekeeper *tk = &tk_core.timekeeper;
unsigned long flags;
@@ -1646,6 +1671,8 @@
raw_spin_lock_irqsave(&timekeeper_lock, flags);
write_seqcount_begin(&tk_core.seq);
+ suspend_timing_needed = false;
+
timekeeping_forward_now(tk);
__timekeeping_inject_sleeptime(tk, delta);
@@ -1669,9 +1696,9 @@
struct clocksource *clock = tk->tkr_mono.clock;
unsigned long flags;
struct timespec64 ts_new, ts_delta;
- u64 cycle_now;
+ u64 cycle_now, nsec;
+ bool inject_sleeptime = false;
- sleeptime_injected = false;
read_persistent_clock64(&ts_new);
clockevents_resume();
@@ -1693,22 +1720,19 @@
* usable source. The rtc part is handled separately in rtc core code.
*/
cycle_now = tk_clock_read(&tk->tkr_mono);
- if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
- cycle_now > tk->tkr_mono.cycle_last) {
- u64 nsec, cyc_delta;
-
- cyc_delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last,
- tk->tkr_mono.mask);
- nsec = mul_u64_u32_shr(cyc_delta, clock->mult, clock->shift);
+ nsec = clocksource_stop_suspend_timing(clock, cycle_now);
+ if (nsec > 0) {
ts_delta = ns_to_timespec64(nsec);
- sleeptime_injected = true;
+ inject_sleeptime = true;
} else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
- sleeptime_injected = true;
+ inject_sleeptime = true;
}
- if (sleeptime_injected)
+ if (inject_sleeptime) {
+ suspend_timing_needed = false;
__timekeeping_inject_sleeptime(tk, &ts_delta);
+ }
/* Re-base the last cycle value */
tk->tkr_mono.cycle_last = cycle_now;
@@ -1732,6 +1756,8 @@
unsigned long flags;
struct timespec64 delta, delta_delta;
static struct timespec64 old_delta;
+ struct clocksource *curr_clock;
+ u64 cycle_now;
read_persistent_clock64(&timekeeping_suspend_time);
@@ -1743,11 +1769,22 @@
if (timekeeping_suspend_time.tv_sec || timekeeping_suspend_time.tv_nsec)
persistent_clock_exists = true;
+ suspend_timing_needed = true;
+
raw_spin_lock_irqsave(&timekeeper_lock, flags);
write_seqcount_begin(&tk_core.seq);
timekeeping_forward_now(tk);
timekeeping_suspended = 1;
+ /*
+ * Since we've called forward_now, cycle_last stores the value
+ * just read from the current clocksource. Save this to potentially
+ * use in suspend timing.
+ */
+ curr_clock = tk->tkr_mono.clock;
+ cycle_now = tk->tkr_mono.cycle_last;
+ clocksource_start_suspend_timing(curr_clock, cycle_now);
+
if (persistent_clock_exists) {
/*
* To avoid drift caused by repeated suspend/resumes,
@@ -2021,11 +2058,11 @@
return offset;
}
-/**
- * update_wall_time - Uses the current clocksource to increment the wall time
- *
+/*
+ * timekeeping_advance - Updates the timekeeper to the current time and
+ * current NTP tick length
*/
-void update_wall_time(void)
+static void timekeeping_advance(enum timekeeping_adv_mode mode)
{
struct timekeeper *real_tk = &tk_core.timekeeper;
struct timekeeper *tk = &shadow_timekeeper;
@@ -2042,14 +2079,17 @@
#ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
offset = real_tk->cycle_interval;
+
+ if (mode != TK_ADV_TICK)
+ goto out;
#else
offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
-#endif
/* Check if there's really nothing to do */
- if (offset < real_tk->cycle_interval)
+ if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
goto out;
+#endif
/* Do some additional sanity checking */
timekeeping_check_update(tk, offset);
@@ -2106,6 +2146,15 @@
}
/**
+ * update_wall_time - Uses the current clocksource to increment the wall time
+ *
+ */
+void update_wall_time(void)
+{
+ timekeeping_advance(TK_ADV_TICK);
+}
+
+/**
* getboottime64 - Return the real time of system boot.
* @ts: pointer to the timespec64 to be set
*
@@ -2220,7 +2269,7 @@
/**
* timekeeping_validate_timex - Ensures the timex is ok for use in do_adjtimex
*/
-static int timekeeping_validate_timex(struct timex *txc)
+static int timekeeping_validate_timex(const struct timex *txc)
{
if (txc->modes & ADJ_ADJTIME) {
/* singleshot must not be used with any other mode bits */
@@ -2310,7 +2359,7 @@
return ret;
}
- getnstimeofday64(&ts);
+ ktime_get_real_ts64(&ts);
raw_spin_lock_irqsave(&timekeeper_lock, flags);
write_seqcount_begin(&tk_core.seq);
@@ -2327,6 +2376,10 @@
write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+ /* Update the multiplier immediately if frequency was set directly */
+ if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
+ timekeeping_advance(TK_ADV_FREQ);
+
if (tai != orig_tai)
clock_was_set();
diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index 0754cad..238e4be 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -70,7 +70,7 @@
}
late_initcall(tk_debug_sleep_time_init);
-void tk_debug_account_sleep_time(struct timespec64 *t)
+void tk_debug_account_sleep_time(const struct timespec64 *t)
{
/* Cap bin index so we don't overflow the array */
int bin = min(fls(t->tv_sec), NUM_BINS-1);
diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index cf5c082..bcbb52d 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -8,7 +8,7 @@
#include <linux/time.h>
#ifdef CONFIG_DEBUG_FS
-extern void tk_debug_account_sleep_time(struct timespec64 *t);
+extern void tk_debug_account_sleep_time(const struct timespec64 *t);
#else
#define tk_debug_account_sleep_time(x)
#endif
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index cc2d23e..fa49cd7 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -581,7 +581,7 @@
* wheel:
*/
base->next_expiry = timer->expires;
- wake_up_nohz_cpu(base->cpu);
+ wake_up_nohz_cpu(base->cpu);
}
static void
@@ -1657,6 +1657,22 @@
raw_spin_lock_irq(&base->lock);
+ /*
+ * timer_base::must_forward_clk must be cleared before running
+ * timers so that any timer functions that call mod_timer() will
+ * not try to forward the base. Idle tracking / clock forwarding
+ * logic is only used with BASE_STD timers.
+ *
+ * The must_forward_clk flag is cleared unconditionally also for
+ * the deferrable base. The deferrable base is not affected by idle
+ * tracking and never forwarded, so clearing the flag is a NOOP.
+ *
+ * The fact that the deferrable base is never forwarded can cause
+ * large variations in granularity for deferrable timers, but they
+ * can be deferred for long periods due to idle anyway.
+ */
+ base->must_forward_clk = false;
+
while (time_after_eq(jiffies, base->clk)) {
levels = collect_expired_timers(base, heads);
@@ -1676,19 +1692,6 @@
{
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
- /*
- * must_forward_clk must be cleared before running timers so that any
- * timer functions that call mod_timer will not try to forward the
- * base. idle trcking / clock forwarding logic is only used with
- * BASE_STD timers.
- *
- * The deferrable base does not do idle tracking at all, so we do
- * not forward it. This can result in very large variations in
- * granularity for deferrable timers, but they can be deferred for
- * long periods due to idle.
- */
- base->must_forward_clk = false;
-
__run_timers(base);
if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
diff --git a/kernel/torture.c b/kernel/torture.c
index 3de1efb..1ac24a8 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -20,6 +20,9 @@
* Author: Paul E. McKenney <paulmck@us.ibm.com>
* Based on kernel/rcu/torture.c.
*/
+
+#define pr_fmt(fmt) fmt
+
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/init.h>
@@ -53,7 +56,7 @@
MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com>");
static char *torture_type;
-static bool verbose;
+static int verbose;
/* Mediate rmmod and system shutdown. Concurrent rmmod & shutdown illegal! */
#define FULLSTOP_DONTSTOP 0 /* Normal operation. */
@@ -98,7 +101,7 @@
if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
return false;
- if (verbose)
+ if (verbose > 1)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: offlining %d\n",
torture_type, cpu);
@@ -111,7 +114,7 @@
"torture_onoff task: offline %d failed: errno %d\n",
torture_type, cpu, ret);
} else {
- if (verbose)
+ if (verbose > 1)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: offlined %d\n",
torture_type, cpu);
@@ -147,7 +150,7 @@
if (cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
return false;
- if (verbose)
+ if (verbose > 1)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: onlining %d\n",
torture_type, cpu);
@@ -160,7 +163,7 @@
"torture_onoff task: online %d failed: errno %d\n",
torture_type, cpu, ret);
} else {
- if (verbose)
+ if (verbose > 1)
pr_alert("%s" TORTURE_FLAG
"torture_onoff task: onlined %d\n",
torture_type, cpu);
@@ -647,7 +650,7 @@
* The runnable parameter points to a flag that controls whether or not
* the test is currently runnable. If there is no such flag, pass in NULL.
*/
-bool torture_init_begin(char *ttype, bool v)
+bool torture_init_begin(char *ttype, int v)
{
mutex_lock(&fullstop_mutex);
if (torture_type != NULL) {
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 6b71860..e9d9946 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1228,16 +1228,11 @@
/*
* We need to check and see if we modified the pc of the
- * pt_regs, and if so clear the kprobe and return 1 so that we
- * don't do the single stepping.
- * The ftrace kprobe handler leaves it up to us to re-enable
- * preemption here before returning if we've modified the ip.
+ * pt_regs, and if so return 1 so that we don't do the
+ * single stepping.
*/
- if (orig_ip != instruction_pointer(regs)) {
- reset_current_kprobe();
- preempt_enable_no_resched();
+ if (orig_ip != instruction_pointer(regs))
return 1;
- }
if (!ret)
return 0;
}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 576d180..5470dce 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -18,18 +18,14 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/sysctl.h>
-#include <linux/smpboot.h>
-#include <linux/sched/rt.h>
-#include <uapi/linux/sched/types.h>
#include <linux/tick.h>
-#include <linux/workqueue.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/isolation.h>
+#include <linux/stop_machine.h>
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
-#include <linux/kthread.h>
static DEFINE_MUTEX(watchdog_mutex);
@@ -169,11 +165,10 @@
unsigned int __read_mostly softlockup_panic =
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
-static bool softlockup_threads_initialized __read_mostly;
+static bool softlockup_initialized __read_mostly;
static u64 __read_mostly sample_period;
static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
-static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
static DEFINE_PER_CPU(bool, softlockup_touch_sync);
static DEFINE_PER_CPU(bool, soft_watchdog_warn);
@@ -335,6 +330,27 @@
__this_cpu_inc(hrtimer_interrupts);
}
+static DEFINE_PER_CPU(struct completion, softlockup_completion);
+static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work);
+
+/*
+ * The watchdog thread function - touches the timestamp.
+ *
+ * It only runs once every sample_period seconds (4 seconds by
+ * default) to reset the softlockup timestamp. If this gets delayed
+ * for more than 2*watchdog_thresh seconds then the debug-printout
+ * triggers in watchdog_timer_fn().
+ */
+static int softlockup_fn(void *data)
+{
+ __this_cpu_write(soft_lockup_hrtimer_cnt,
+ __this_cpu_read(hrtimer_interrupts));
+ __touch_watchdog();
+ complete(this_cpu_ptr(&softlockup_completion));
+
+ return 0;
+}
+
/* watchdog kicker functions */
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
@@ -350,7 +366,12 @@
watchdog_interrupt_count();
/* kick the softlockup detector */
- wake_up_process(__this_cpu_read(softlockup_watchdog));
+ if (completion_done(this_cpu_ptr(&softlockup_completion))) {
+ reinit_completion(this_cpu_ptr(&softlockup_completion));
+ stop_one_cpu_nowait(smp_processor_id(),
+ softlockup_fn, NULL,
+ this_cpu_ptr(&softlockup_stop_work));
+ }
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
@@ -448,16 +469,15 @@
return HRTIMER_RESTART;
}
-static void watchdog_set_prio(unsigned int policy, unsigned int prio)
-{
- struct sched_param param = { .sched_priority = prio };
-
- sched_setscheduler(current, policy, ¶m);
-}
-
static void watchdog_enable(unsigned int cpu)
{
struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
+ struct completion *done = this_cpu_ptr(&softlockup_completion);
+
+ WARN_ON_ONCE(cpu != smp_processor_id());
+
+ init_completion(done);
+ complete(done);
/*
* Start the timer first to prevent the NMI watchdog triggering
@@ -473,15 +493,14 @@
/* Enable the perf event */
if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
watchdog_nmi_enable(cpu);
-
- watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);
}
static void watchdog_disable(unsigned int cpu)
{
struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
- watchdog_set_prio(SCHED_NORMAL, 0);
+ WARN_ON_ONCE(cpu != smp_processor_id());
+
/*
* Disable the perf event first. That prevents that a large delay
* between disabling the timer and disabling the perf event causes
@@ -489,79 +508,66 @@
*/
watchdog_nmi_disable(cpu);
hrtimer_cancel(hrtimer);
+ wait_for_completion(this_cpu_ptr(&softlockup_completion));
}
-static void watchdog_cleanup(unsigned int cpu, bool online)
+static int softlockup_stop_fn(void *data)
{
- watchdog_disable(cpu);
+ watchdog_disable(smp_processor_id());
+ return 0;
}
-static int watchdog_should_run(unsigned int cpu)
+static void softlockup_stop_all(void)
{
- return __this_cpu_read(hrtimer_interrupts) !=
- __this_cpu_read(soft_lockup_hrtimer_cnt);
-}
+ int cpu;
-/*
- * The watchdog thread function - touches the timestamp.
- *
- * It only runs once every sample_period seconds (4 seconds by
- * default) to reset the softlockup timestamp. If this gets delayed
- * for more than 2*watchdog_thresh seconds then the debug-printout
- * triggers in watchdog_timer_fn().
- */
-static void watchdog(unsigned int cpu)
-{
- __this_cpu_write(soft_lockup_hrtimer_cnt,
- __this_cpu_read(hrtimer_interrupts));
- __touch_watchdog();
-}
-
-static struct smp_hotplug_thread watchdog_threads = {
- .store = &softlockup_watchdog,
- .thread_should_run = watchdog_should_run,
- .thread_fn = watchdog,
- .thread_comm = "watchdog/%u",
- .setup = watchdog_enable,
- .cleanup = watchdog_cleanup,
- .park = watchdog_disable,
- .unpark = watchdog_enable,
-};
-
-static void softlockup_update_smpboot_threads(void)
-{
- lockdep_assert_held(&watchdog_mutex);
-
- if (!softlockup_threads_initialized)
+ if (!softlockup_initialized)
return;
- smpboot_update_cpumask_percpu_thread(&watchdog_threads,
- &watchdog_allowed_mask);
-}
+ for_each_cpu(cpu, &watchdog_allowed_mask)
+ smp_call_on_cpu(cpu, softlockup_stop_fn, NULL, false);
-/* Temporarily park all watchdog threads */
-static void softlockup_park_all_threads(void)
-{
cpumask_clear(&watchdog_allowed_mask);
- softlockup_update_smpboot_threads();
}
-/* Unpark enabled threads */
-static void softlockup_unpark_threads(void)
+static int softlockup_start_fn(void *data)
{
+ watchdog_enable(smp_processor_id());
+ return 0;
+}
+
+static void softlockup_start_all(void)
+{
+ int cpu;
+
cpumask_copy(&watchdog_allowed_mask, &watchdog_cpumask);
- softlockup_update_smpboot_threads();
+ for_each_cpu(cpu, &watchdog_allowed_mask)
+ smp_call_on_cpu(cpu, softlockup_start_fn, NULL, false);
+}
+
+int lockup_detector_online_cpu(unsigned int cpu)
+{
+ watchdog_enable(cpu);
+ return 0;
+}
+
+int lockup_detector_offline_cpu(unsigned int cpu)
+{
+ watchdog_disable(cpu);
+ return 0;
}
static void lockup_detector_reconfigure(void)
{
cpus_read_lock();
watchdog_nmi_stop();
- softlockup_park_all_threads();
+
+ softlockup_stop_all();
set_sample_period();
lockup_detector_update_enable();
if (watchdog_enabled && watchdog_thresh)
- softlockup_unpark_threads();
+ softlockup_start_all();
+
watchdog_nmi_start();
cpus_read_unlock();
/*
@@ -580,8 +586,6 @@
*/
static __init void lockup_detector_setup(void)
{
- int ret;
-
/*
* If sysctl is off and watchdog got disabled on the command line,
* nothing to do here.
@@ -592,24 +596,13 @@
!(watchdog_enabled && watchdog_thresh))
return;
- ret = smpboot_register_percpu_thread_cpumask(&watchdog_threads,
- &watchdog_allowed_mask);
- if (ret) {
- pr_err("Failed to initialize soft lockup detector threads\n");
- return;
- }
-
mutex_lock(&watchdog_mutex);
- softlockup_threads_initialized = true;
lockup_detector_reconfigure();
+ softlockup_initialized = true;
mutex_unlock(&watchdog_mutex);
}
#else /* CONFIG_SOFTLOCKUP_DETECTOR */
-static inline int watchdog_park_threads(void) { return 0; }
-static inline void watchdog_unpark_threads(void) { }
-static inline int watchdog_enable_all_cpus(void) { return 0; }
-static inline void watchdog_disable_all_cpus(void) { }
static void lockup_detector_reconfigure(void)
{
cpus_read_lock();
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index e449a23..1f7020d 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -175,8 +175,8 @@
evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
watchdog_overflow_callback, NULL);
if (IS_ERR(evt)) {
- pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
- PTR_ERR(evt));
+ pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
+ PTR_ERR(evt));
return PTR_ERR(evt);
}
this_cpu_write(watchdog_ev, evt);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 8838d11..0b066b3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1718,7 +1718,7 @@
default n
help
This option provides for testing basic kprobes functionality on
- boot. A sample kprobe, jprobe and kretprobe are inserted and
+ boot. Samples of kprobe and kretprobe are inserted and
verified for functionality.
Say N if you are unsure.
diff --git a/lib/atomic64.c b/lib/atomic64.c
index 53c2d5e..1d91e31 100644
--- a/lib/atomic64.c
+++ b/lib/atomic64.c
@@ -178,18 +178,18 @@
}
EXPORT_SYMBOL(atomic64_xchg);
-int atomic64_add_unless(atomic64_t *v, long long a, long long u)
+long long atomic64_fetch_add_unless(atomic64_t *v, long long a, long long u)
{
unsigned long flags;
raw_spinlock_t *lock = lock_addr(v);
- int ret = 0;
+ long long val;
raw_spin_lock_irqsave(lock, flags);
- if (v->counter != u) {
+ val = v->counter;
+ if (val != u)
v->counter += a;
- ret = 1;
- }
raw_spin_unlock_irqrestore(lock, flags);
- return ret;
+
+ return val;
}
-EXPORT_SYMBOL(atomic64_add_unless);
+EXPORT_SYMBOL(atomic64_fetch_add_unless);
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 994be48..70935ed 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -360,9 +360,12 @@
limit++;
if (is_on_stack)
- pr_warn("object is on stack, but not annotated\n");
+ pr_warn("object %p is on stack %p, but NOT annotated.\n", addr,
+ task_stack_page(current));
else
- pr_warn("object is not on stack, but annotated\n");
+ pr_warn("object %p is NOT on stack %p, but annotated.\n", addr,
+ task_stack_page(current));
+
WARN_ON(1);
}
@@ -1185,8 +1188,7 @@
if (!obj_cache || debug_objects_replace_static_objects()) {
debug_objects_enabled = 0;
- if (obj_cache)
- kmem_cache_destroy(obj_cache);
+ kmem_cache_destroy(obj_cache);
pr_warn("out of memory.\n");
} else
debug_objects_selftest();
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bba..517f585 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
- pmd_free_pte_page(pmd)) {
+ pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
- pud_free_pmd_page(pud)) {
+ pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}
diff --git a/lib/raid6/s390vx.uc b/lib/raid6/s390vx.uc
index 140fa8b..914ebe9 100644
--- a/lib/raid6/s390vx.uc
+++ b/lib/raid6/s390vx.uc
@@ -55,22 +55,24 @@
asm volatile ("VX %0,%1,%2" : : "i" (x), "i" (y), "i" (z));
}
-static inline void LOAD_DATA(int x, int n, u8 *ptr)
+static inline void LOAD_DATA(int x, u8 *ptr)
{
- typedef struct { u8 _[16*n]; } addrtype;
+ typedef struct { u8 _[16 * $#]; } addrtype;
register addrtype *__ptr asm("1") = (addrtype *) ptr;
asm volatile ("VLM %2,%3,0,%r1"
- : : "m" (*__ptr), "a" (__ptr), "i" (x), "i" (x + n - 1));
+ : : "m" (*__ptr), "a" (__ptr), "i" (x),
+ "i" (x + $# - 1));
}
-static inline void STORE_DATA(int x, int n, u8 *ptr)
+static inline void STORE_DATA(int x, u8 *ptr)
{
- typedef struct { u8 _[16*n]; } addrtype;
+ typedef struct { u8 _[16 * $#]; } addrtype;
register addrtype *__ptr asm("1") = (addrtype *) ptr;
asm volatile ("VSTM %2,%3,0,1"
- : "=m" (*__ptr) : "a" (__ptr), "i" (x), "i" (x + n - 1));
+ : "=m" (*__ptr) : "a" (__ptr), "i" (x),
+ "i" (x + $# - 1));
}
static inline void COPY_VEC(int x, int y)
@@ -93,19 +95,19 @@
q = dptr[z0 + 2]; /* RS syndrome */
for (d = 0; d < bytes; d += $#*NSIZE) {
- LOAD_DATA(0,$#,&dptr[z0][d]);
+ LOAD_DATA(0,&dptr[z0][d]);
COPY_VEC(8+$$,0+$$);
for (z = z0 - 1; z >= 0; z--) {
MASK(16+$$,8+$$);
AND(16+$$,16+$$,25);
SHLBYTE(8+$$,8+$$);
XOR(8+$$,8+$$,16+$$);
- LOAD_DATA(16,$#,&dptr[z][d]);
+ LOAD_DATA(16,&dptr[z][d]);
XOR(0+$$,0+$$,16+$$);
XOR(8+$$,8+$$,16+$$);
}
- STORE_DATA(0,$#,&p[d]);
- STORE_DATA(8,$#,&q[d]);
+ STORE_DATA(0,&p[d]);
+ STORE_DATA(8,&q[d]);
}
kernel_fpu_end(&vxstate, KERNEL_VXR);
}
@@ -127,14 +129,14 @@
for (d = 0; d < bytes; d += $#*NSIZE) {
/* P/Q data pages */
- LOAD_DATA(0,$#,&dptr[z0][d]);
+ LOAD_DATA(0,&dptr[z0][d]);
COPY_VEC(8+$$,0+$$);
for (z = z0 - 1; z >= start; z--) {
MASK(16+$$,8+$$);
AND(16+$$,16+$$,25);
SHLBYTE(8+$$,8+$$);
XOR(8+$$,8+$$,16+$$);
- LOAD_DATA(16,$#,&dptr[z][d]);
+ LOAD_DATA(16,&dptr[z][d]);
XOR(0+$$,0+$$,16+$$);
XOR(8+$$,8+$$,16+$$);
}
@@ -145,12 +147,12 @@
SHLBYTE(8+$$,8+$$);
XOR(8+$$,8+$$,16+$$);
}
- LOAD_DATA(16,$#,&p[d]);
+ LOAD_DATA(16,&p[d]);
XOR(16+$$,16+$$,0+$$);
- STORE_DATA(16,$#,&p[d]);
- LOAD_DATA(16,$#,&q[d]);
+ STORE_DATA(16,&p[d]);
+ LOAD_DATA(16,&q[d]);
XOR(16+$$,16+$$,8+$$);
- STORE_DATA(16,$#,&q[d]);
+ STORE_DATA(16,&q[d]);
}
kernel_fpu_end(&vxstate, KERNEL_VXR);
}
diff --git a/lib/refcount.c b/lib/refcount.c
index d3b81ce..ebcf8cd 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -35,13 +35,13 @@
*
*/
+#include <linux/mutex.h>
#include <linux/refcount.h>
+#include <linux/spinlock.h>
#include <linux/bug.h>
-#ifdef CONFIG_REFCOUNT_FULL
-
/**
- * refcount_add_not_zero - add a value to a refcount unless it is 0
+ * refcount_add_not_zero_checked - add a value to a refcount unless it is 0
* @i: the value to add to the refcount
* @r: the refcount
*
@@ -58,7 +58,7 @@
*
* Return: false if the passed refcount is 0, true otherwise
*/
-bool refcount_add_not_zero(unsigned int i, refcount_t *r)
+bool refcount_add_not_zero_checked(unsigned int i, refcount_t *r)
{
unsigned int new, val = atomic_read(&r->refs);
@@ -79,10 +79,10 @@
return true;
}
-EXPORT_SYMBOL(refcount_add_not_zero);
+EXPORT_SYMBOL(refcount_add_not_zero_checked);
/**
- * refcount_add - add a value to a refcount
+ * refcount_add_checked - add a value to a refcount
* @i: the value to add to the refcount
* @r: the refcount
*
@@ -97,14 +97,14 @@
* cases, refcount_inc(), or one of its variants, should instead be used to
* increment a reference count.
*/
-void refcount_add(unsigned int i, refcount_t *r)
+void refcount_add_checked(unsigned int i, refcount_t *r)
{
- WARN_ONCE(!refcount_add_not_zero(i, r), "refcount_t: addition on 0; use-after-free.\n");
+ WARN_ONCE(!refcount_add_not_zero_checked(i, r), "refcount_t: addition on 0; use-after-free.\n");
}
-EXPORT_SYMBOL(refcount_add);
+EXPORT_SYMBOL(refcount_add_checked);
/**
- * refcount_inc_not_zero - increment a refcount unless it is 0
+ * refcount_inc_not_zero_checked - increment a refcount unless it is 0
* @r: the refcount to increment
*
* Similar to atomic_inc_not_zero(), but will saturate at UINT_MAX and WARN.
@@ -115,7 +115,7 @@
*
* Return: true if the increment was successful, false otherwise
*/
-bool refcount_inc_not_zero(refcount_t *r)
+bool refcount_inc_not_zero_checked(refcount_t *r)
{
unsigned int new, val = atomic_read(&r->refs);
@@ -134,10 +134,10 @@
return true;
}
-EXPORT_SYMBOL(refcount_inc_not_zero);
+EXPORT_SYMBOL(refcount_inc_not_zero_checked);
/**
- * refcount_inc - increment a refcount
+ * refcount_inc_checked - increment a refcount
* @r: the refcount to increment
*
* Similar to atomic_inc(), but will saturate at UINT_MAX and WARN.
@@ -148,14 +148,14 @@
* Will WARN if the refcount is 0, as this represents a possible use-after-free
* condition.
*/
-void refcount_inc(refcount_t *r)
+void refcount_inc_checked(refcount_t *r)
{
- WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
+ WARN_ONCE(!refcount_inc_not_zero_checked(r), "refcount_t: increment on 0; use-after-free.\n");
}
-EXPORT_SYMBOL(refcount_inc);
+EXPORT_SYMBOL(refcount_inc_checked);
/**
- * refcount_sub_and_test - subtract from a refcount and test if it is 0
+ * refcount_sub_and_test_checked - subtract from a refcount and test if it is 0
* @i: amount to subtract from the refcount
* @r: the refcount
*
@@ -174,7 +174,7 @@
*
* Return: true if the resulting refcount is 0, false otherwise
*/
-bool refcount_sub_and_test(unsigned int i, refcount_t *r)
+bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
{
unsigned int new, val = atomic_read(&r->refs);
@@ -192,10 +192,10 @@
return !new;
}
-EXPORT_SYMBOL(refcount_sub_and_test);
+EXPORT_SYMBOL(refcount_sub_and_test_checked);
/**
- * refcount_dec_and_test - decrement a refcount and test if it is 0
+ * refcount_dec_and_test_checked - decrement a refcount and test if it is 0
* @r: the refcount
*
* Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
@@ -207,14 +207,14 @@
*
* Return: true if the resulting refcount is 0, false otherwise
*/
-bool refcount_dec_and_test(refcount_t *r)
+bool refcount_dec_and_test_checked(refcount_t *r)
{
- return refcount_sub_and_test(1, r);
+ return refcount_sub_and_test_checked(1, r);
}
-EXPORT_SYMBOL(refcount_dec_and_test);
+EXPORT_SYMBOL(refcount_dec_and_test_checked);
/**
- * refcount_dec - decrement a refcount
+ * refcount_dec_checked - decrement a refcount
* @r: the refcount
*
* Similar to atomic_dec(), it will WARN on underflow and fail to decrement
@@ -223,12 +223,11 @@
* Provides release memory ordering, such that prior loads and stores are done
* before.
*/
-void refcount_dec(refcount_t *r)
+void refcount_dec_checked(refcount_t *r)
{
- WARN_ONCE(refcount_dec_and_test(r), "refcount_t: decrement hit 0; leaking memory.\n");
+ WARN_ONCE(refcount_dec_and_test_checked(r), "refcount_t: decrement hit 0; leaking memory.\n");
}
-EXPORT_SYMBOL(refcount_dec);
-#endif /* CONFIG_REFCOUNT_FULL */
+EXPORT_SYMBOL(refcount_dec_checked);
/**
* refcount_dec_if_one - decrement a refcount if it is 1
diff --git a/mm/init-mm.c b/mm/init-mm.c
index f0179c9..a787a31 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -15,6 +15,16 @@
#define INIT_MM_CONTEXT(name)
#endif
+/*
+ * For dynamically allocated mm_structs, there is a dynamically sized cpumask
+ * at the end of the structure, the size of which depends on the maximum CPU
+ * number the system can see. That way we allocate only as much memory for
+ * mm_cpumask() as needed for the hundreds, or thousands of processes that
+ * a system typically runs.
+ *
+ * Since there is only one init_mm in the entire system, keep it simple
+ * and size this cpu_bitmask to NR_CPUS.
+ */
struct mm_struct init_mm = {
.mm_rb = RB_ROOT,
.pgd = swapper_pg_dir,
@@ -25,5 +35,6 @@
.arg_lock = __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
.user_ns = &init_user_ns,
+ .cpu_bitmap = { [BITS_TO_LONGS(NR_CPUS)] = 0},
INIT_MM_CONTEXT(init_mm)
};
diff --git a/mm/memory.c b/mm/memory.c
index c5e87a3..3d0a74a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -326,16 +326,20 @@
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
-/*
- * See the comment near struct mmu_table_batch.
- */
-
static void tlb_remove_table_smp_sync(void *arg)
{
- /* Simply deliver the interrupt */
+ struct mm_struct __maybe_unused *mm = arg;
+ /*
+ * On most architectures this does nothing. Simply delivering the
+ * interrupt is enough to prevent races with software page table
+ * walking like that done in get_user_pages_fast.
+ *
+ * See the comment near struct mmu_table_batch.
+ */
+ tlb_flush_remove_tables_local(mm);
}
-static void tlb_remove_table_one(void *table)
+static void tlb_remove_table_one(void *table, struct mmu_gather *tlb)
{
/*
* This isn't an RCU grace period and hence the page-tables cannot be
@@ -344,7 +348,7 @@
* It is however sufficient for software page-table walkers that rely on
* IRQ disabling. See the comment near struct mmu_table_batch.
*/
- smp_call_function(tlb_remove_table_smp_sync, NULL, 1);
+ smp_call_function(tlb_remove_table_smp_sync, tlb->mm, 1);
__tlb_remove_table(table);
}
@@ -365,6 +369,8 @@
{
struct mmu_table_batch **batch = &tlb->batch;
+ tlb_flush_remove_tables(tlb->mm);
+
if (*batch) {
call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
*batch = NULL;
@@ -387,7 +393,7 @@
if (*batch == NULL) {
*batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
if (*batch == NULL) {
- tlb_remove_table_one(table);
+ tlb_remove_table_one(table, tlb);
return;
}
(*batch)->nr = 0;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a790ef4..3222193 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6939,9 +6939,21 @@
start = (void *)PAGE_ALIGN((unsigned long)start);
end = (void *)((unsigned long)end & PAGE_MASK);
for (pos = start; pos < end; pos += PAGE_SIZE, pages++) {
+ struct page *page = virt_to_page(pos);
+ void *direct_map_addr;
+
+ /*
+ * 'direct_map_addr' might be different from 'pos'
+ * because some architectures' virt_to_page()
+ * work with aliases. Getting the direct map
+ * address ensures that we get a _writeable_
+ * alias for the memset().
+ */
+ direct_map_addr = page_address(page);
if ((unsigned int)poison <= 0xFF)
- memset(pos, poison, PAGE_SIZE);
- free_reserved_page(virt_to_page(pos));
+ memset(direct_map_addr, poison, PAGE_SIZE);
+
+ free_reserved_page(page);
}
if (pages && s)
diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index af8c4b3..d84227d 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -244,7 +244,7 @@
* the packet count limit, so...
*/
if (atm_may_send(pvcc->atmvcc, size) &&
- atomic_inc_not_zero_hint(&pvcc->inflight, NONE_INFLIGHT))
+ atomic_inc_not_zero(&pvcc->inflight))
return 1;
/*
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index f6734d8..9486293 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -415,7 +415,7 @@
bool rxrpc_queue_call(struct rxrpc_call *call)
{
const void *here = __builtin_return_address(0);
- int n = __atomic_add_unless(&call->usage, 1, 0);
+ int n = atomic_fetch_add_unless(&call->usage, 1, 0);
if (n == 0)
return false;
if (rxrpc_queue_work(&call->processor))
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index 4c77a78..77440a3 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -266,7 +266,7 @@
bool rxrpc_queue_conn(struct rxrpc_connection *conn)
{
const void *here = __builtin_return_address(0);
- int n = __atomic_add_unless(&conn->usage, 1, 0);
+ int n = atomic_fetch_add_unless(&conn->usage, 1, 0);
if (n == 0)
return false;
if (rxrpc_queue_work(&conn->processor))
@@ -309,7 +309,7 @@
const void *here = __builtin_return_address(0);
if (conn) {
- int n = __atomic_add_unless(&conn->usage, 1, 0);
+ int n = atomic_fetch_add_unless(&conn->usage, 1, 0);
if (n > 0)
trace_rxrpc_conn(conn, rxrpc_conn_got, n + 1, here);
else
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index b493e6b..777c3ed 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -305,7 +305,7 @@
const void *here = __builtin_return_address(0);
if (local) {
- int n = __atomic_add_unless(&local->usage, 1, 0);
+ int n = atomic_fetch_add_unless(&local->usage, 1, 0);
if (n > 0)
trace_rxrpc_local(local, rxrpc_local_got, n + 1, here);
else
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 24ec7cd..1dc7648 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -406,7 +406,7 @@
const void *here = __builtin_return_address(0);
if (peer) {
- int n = __atomic_add_unless(&peer->usage, 1, 0);
+ int n = atomic_fetch_add_unless(&peer->usage, 1, 0);
if (n > 0)
trace_rxrpc_peer(peer, rxrpc_peer_got, n + 1, here);
else
diff --git a/security/Kconfig b/security/Kconfig
index c430206..afa91c6 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -57,7 +57,7 @@
config PAGE_TABLE_ISOLATION
bool "Remove the kernel mapping in user mode"
default y
- depends on X86_64 && !UML
+ depends on X86 && !UML
help
This feature reduces the number of hardware side channels by
ensuring that the majority of kernel addresses are not mapped
diff --git a/tools/arch/arm64/include/uapi/asm/unistd.h b/tools/arch/arm64/include/uapi/asm/unistd.h
new file mode 100644
index 0000000..5072cbd
--- /dev/null
+++ b/tools/arch/arm64/include/uapi/asm/unistd.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define __ARCH_WANT_RENAMEAT
+
+#include <asm-generic/unistd.h>
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
new file mode 100644
index 0000000..4299067
--- /dev/null
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -0,0 +1,783 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#include <asm/bitsperlong.h>
+
+/*
+ * This file contains the system call numbers, based on the
+ * layout of the x86-64 architecture, which embeds the
+ * pointer to the syscall in the table.
+ *
+ * As a basic principle, no duplication of functionality
+ * should be added, e.g. we don't use lseek when llseek
+ * is present. New architectures should use this file
+ * and implement the less feature-full calls in user space.
+ */
+
+#ifndef __SYSCALL
+#define __SYSCALL(x, y)
+#endif
+
+#if __BITS_PER_LONG == 32 || defined(__SYSCALL_COMPAT)
+#define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _32)
+#else
+#define __SC_3264(_nr, _32, _64) __SYSCALL(_nr, _64)
+#endif
+
+#ifdef __SYSCALL_COMPAT
+#define __SC_COMP(_nr, _sys, _comp) __SYSCALL(_nr, _comp)
+#define __SC_COMP_3264(_nr, _32, _64, _comp) __SYSCALL(_nr, _comp)
+#else
+#define __SC_COMP(_nr, _sys, _comp) __SYSCALL(_nr, _sys)
+#define __SC_COMP_3264(_nr, _32, _64, _comp) __SC_3264(_nr, _32, _64)
+#endif
+
+#define __NR_io_setup 0
+__SC_COMP(__NR_io_setup, sys_io_setup, compat_sys_io_setup)
+#define __NR_io_destroy 1
+__SYSCALL(__NR_io_destroy, sys_io_destroy)
+#define __NR_io_submit 2
+__SC_COMP(__NR_io_submit, sys_io_submit, compat_sys_io_submit)
+#define __NR_io_cancel 3
+__SYSCALL(__NR_io_cancel, sys_io_cancel)
+#define __NR_io_getevents 4
+__SC_COMP(__NR_io_getevents, sys_io_getevents, compat_sys_io_getevents)
+
+/* fs/xattr.c */
+#define __NR_setxattr 5
+__SYSCALL(__NR_setxattr, sys_setxattr)
+#define __NR_lsetxattr 6
+__SYSCALL(__NR_lsetxattr, sys_lsetxattr)
+#define __NR_fsetxattr 7
+__SYSCALL(__NR_fsetxattr, sys_fsetxattr)
+#define __NR_getxattr 8
+__SYSCALL(__NR_getxattr, sys_getxattr)
+#define __NR_lgetxattr 9
+__SYSCALL(__NR_lgetxattr, sys_lgetxattr)
+#define __NR_fgetxattr 10
+__SYSCALL(__NR_fgetxattr, sys_fgetxattr)
+#define __NR_listxattr 11
+__SYSCALL(__NR_listxattr, sys_listxattr)
+#define __NR_llistxattr 12
+__SYSCALL(__NR_llistxattr, sys_llistxattr)
+#define __NR_flistxattr 13
+__SYSCALL(__NR_flistxattr, sys_flistxattr)
+#define __NR_removexattr 14
+__SYSCALL(__NR_removexattr, sys_removexattr)
+#define __NR_lremovexattr 15
+__SYSCALL(__NR_lremovexattr, sys_lremovexattr)
+#define __NR_fremovexattr 16
+__SYSCALL(__NR_fremovexattr, sys_fremovexattr)
+
+/* fs/dcache.c */
+#define __NR_getcwd 17
+__SYSCALL(__NR_getcwd, sys_getcwd)
+
+/* fs/cookies.c */
+#define __NR_lookup_dcookie 18
+__SC_COMP(__NR_lookup_dcookie, sys_lookup_dcookie, compat_sys_lookup_dcookie)
+
+/* fs/eventfd.c */
+#define __NR_eventfd2 19
+__SYSCALL(__NR_eventfd2, sys_eventfd2)
+
+/* fs/eventpoll.c */
+#define __NR_epoll_create1 20
+__SYSCALL(__NR_epoll_create1, sys_epoll_create1)
+#define __NR_epoll_ctl 21
+__SYSCALL(__NR_epoll_ctl, sys_epoll_ctl)
+#define __NR_epoll_pwait 22
+__SC_COMP(__NR_epoll_pwait, sys_epoll_pwait, compat_sys_epoll_pwait)
+
+/* fs/fcntl.c */
+#define __NR_dup 23
+__SYSCALL(__NR_dup, sys_dup)
+#define __NR_dup3 24
+__SYSCALL(__NR_dup3, sys_dup3)
+#define __NR3264_fcntl 25
+__SC_COMP_3264(__NR3264_fcntl, sys_fcntl64, sys_fcntl, compat_sys_fcntl64)
+
+/* fs/inotify_user.c */
+#define __NR_inotify_init1 26
+__SYSCALL(__NR_inotify_init1, sys_inotify_init1)
+#define __NR_inotify_add_watch 27
+__SYSCALL(__NR_inotify_add_watch, sys_inotify_add_watch)
+#define __NR_inotify_rm_watch 28
+__SYSCALL(__NR_inotify_rm_watch, sys_inotify_rm_watch)
+
+/* fs/ioctl.c */
+#define __NR_ioctl 29
+__SC_COMP(__NR_ioctl, sys_ioctl, compat_sys_ioctl)
+
+/* fs/ioprio.c */
+#define __NR_ioprio_set 30
+__SYSCALL(__NR_ioprio_set, sys_ioprio_set)
+#define __NR_ioprio_get 31
+__SYSCALL(__NR_ioprio_get, sys_ioprio_get)
+
+/* fs/locks.c */
+#define __NR_flock 32
+__SYSCALL(__NR_flock, sys_flock)
+
+/* fs/namei.c */
+#define __NR_mknodat 33
+__SYSCALL(__NR_mknodat, sys_mknodat)
+#define __NR_mkdirat 34
+__SYSCALL(__NR_mkdirat, sys_mkdirat)
+#define __NR_unlinkat 35
+__SYSCALL(__NR_unlinkat, sys_unlinkat)
+#define __NR_symlinkat 36
+__SYSCALL(__NR_symlinkat, sys_symlinkat)
+#define __NR_linkat 37
+__SYSCALL(__NR_linkat, sys_linkat)
+#ifdef __ARCH_WANT_RENAMEAT
+/* renameat is superseded with flags by renameat2 */
+#define __NR_renameat 38
+__SYSCALL(__NR_renameat, sys_renameat)
+#endif /* __ARCH_WANT_RENAMEAT */
+
+/* fs/namespace.c */
+#define __NR_umount2 39
+__SYSCALL(__NR_umount2, sys_umount)
+#define __NR_mount 40
+__SC_COMP(__NR_mount, sys_mount, compat_sys_mount)
+#define __NR_pivot_root 41
+__SYSCALL(__NR_pivot_root, sys_pivot_root)
+
+/* fs/nfsctl.c */
+#define __NR_nfsservctl 42
+__SYSCALL(__NR_nfsservctl, sys_ni_syscall)
+
+/* fs/open.c */
+#define __NR3264_statfs 43
+__SC_COMP_3264(__NR3264_statfs, sys_statfs64, sys_statfs, \
+ compat_sys_statfs64)
+#define __NR3264_fstatfs 44
+__SC_COMP_3264(__NR3264_fstatfs, sys_fstatfs64, sys_fstatfs, \
+ compat_sys_fstatfs64)
+#define __NR3264_truncate 45
+__SC_COMP_3264(__NR3264_truncate, sys_truncate64, sys_truncate, \
+ compat_sys_truncate64)
+#define __NR3264_ftruncate 46
+__SC_COMP_3264(__NR3264_ftruncate, sys_ftruncate64, sys_ftruncate, \
+ compat_sys_ftruncate64)
+
+#define __NR_fallocate 47
+__SC_COMP(__NR_fallocate, sys_fallocate, compat_sys_fallocate)
+#define __NR_faccessat 48
+__SYSCALL(__NR_faccessat, sys_faccessat)
+#define __NR_chdir 49
+__SYSCALL(__NR_chdir, sys_chdir)
+#define __NR_fchdir 50
+__SYSCALL(__NR_fchdir, sys_fchdir)
+#define __NR_chroot 51
+__SYSCALL(__NR_chroot, sys_chroot)
+#define __NR_fchmod 52
+__SYSCALL(__NR_fchmod, sys_fchmod)
+#define __NR_fchmodat 53
+__SYSCALL(__NR_fchmodat, sys_fchmodat)
+#define __NR_fchownat 54
+__SYSCALL(__NR_fchownat, sys_fchownat)
+#define __NR_fchown 55
+__SYSCALL(__NR_fchown, sys_fchown)
+#define __NR_openat 56
+__SC_COMP(__NR_openat, sys_openat, compat_sys_openat)
+#define __NR_close 57
+__SYSCALL(__NR_close, sys_close)
+#define __NR_vhangup 58
+__SYSCALL(__NR_vhangup, sys_vhangup)
+
+/* fs/pipe.c */
+#define __NR_pipe2 59
+__SYSCALL(__NR_pipe2, sys_pipe2)
+
+/* fs/quota.c */
+#define __NR_quotactl 60
+__SYSCALL(__NR_quotactl, sys_quotactl)
+
+/* fs/readdir.c */
+#define __NR_getdents64 61
+__SYSCALL(__NR_getdents64, sys_getdents64)
+
+/* fs/read_write.c */
+#define __NR3264_lseek 62
+__SC_3264(__NR3264_lseek, sys_llseek, sys_lseek)
+#define __NR_read 63
+__SYSCALL(__NR_read, sys_read)
+#define __NR_write 64
+__SYSCALL(__NR_write, sys_write)
+#define __NR_readv 65
+__SC_COMP(__NR_readv, sys_readv, compat_sys_readv)
+#define __NR_writev 66
+__SC_COMP(__NR_writev, sys_writev, compat_sys_writev)
+#define __NR_pread64 67
+__SC_COMP(__NR_pread64, sys_pread64, compat_sys_pread64)
+#define __NR_pwrite64 68
+__SC_COMP(__NR_pwrite64, sys_pwrite64, compat_sys_pwrite64)
+#define __NR_preadv 69
+__SC_COMP(__NR_preadv, sys_preadv, compat_sys_preadv)
+#define __NR_pwritev 70
+__SC_COMP(__NR_pwritev, sys_pwritev, compat_sys_pwritev)
+
+/* fs/sendfile.c */
+#define __NR3264_sendfile 71
+__SYSCALL(__NR3264_sendfile, sys_sendfile64)
+
+/* fs/select.c */
+#define __NR_pselect6 72
+__SC_COMP(__NR_pselect6, sys_pselect6, compat_sys_pselect6)
+#define __NR_ppoll 73
+__SC_COMP(__NR_ppoll, sys_ppoll, compat_sys_ppoll)
+
+/* fs/signalfd.c */
+#define __NR_signalfd4 74
+__SC_COMP(__NR_signalfd4, sys_signalfd4, compat_sys_signalfd4)
+
+/* fs/splice.c */
+#define __NR_vmsplice 75
+__SC_COMP(__NR_vmsplice, sys_vmsplice, compat_sys_vmsplice)
+#define __NR_splice 76
+__SYSCALL(__NR_splice, sys_splice)
+#define __NR_tee 77
+__SYSCALL(__NR_tee, sys_tee)
+
+/* fs/stat.c */
+#define __NR_readlinkat 78
+__SYSCALL(__NR_readlinkat, sys_readlinkat)
+#define __NR3264_fstatat 79
+__SC_3264(__NR3264_fstatat, sys_fstatat64, sys_newfstatat)
+#define __NR3264_fstat 80
+__SC_3264(__NR3264_fstat, sys_fstat64, sys_newfstat)
+
+/* fs/sync.c */
+#define __NR_sync 81
+__SYSCALL(__NR_sync, sys_sync)
+#define __NR_fsync 82
+__SYSCALL(__NR_fsync, sys_fsync)
+#define __NR_fdatasync 83
+__SYSCALL(__NR_fdatasync, sys_fdatasync)
+#ifdef __ARCH_WANT_SYNC_FILE_RANGE2
+#define __NR_sync_file_range2 84
+__SC_COMP(__NR_sync_file_range2, sys_sync_file_range2, \
+ compat_sys_sync_file_range2)
+#else
+#define __NR_sync_file_range 84
+__SC_COMP(__NR_sync_file_range, sys_sync_file_range, \
+ compat_sys_sync_file_range)
+#endif
+
+/* fs/timerfd.c */
+#define __NR_timerfd_create 85
+__SYSCALL(__NR_timerfd_create, sys_timerfd_create)
+#define __NR_timerfd_settime 86
+__SC_COMP(__NR_timerfd_settime, sys_timerfd_settime, \
+ compat_sys_timerfd_settime)
+#define __NR_timerfd_gettime 87
+__SC_COMP(__NR_timerfd_gettime, sys_timerfd_gettime, \
+ compat_sys_timerfd_gettime)
+
+/* fs/utimes.c */
+#define __NR_utimensat 88
+__SC_COMP(__NR_utimensat, sys_utimensat, compat_sys_utimensat)
+
+/* kernel/acct.c */
+#define __NR_acct 89
+__SYSCALL(__NR_acct, sys_acct)
+
+/* kernel/capability.c */
+#define __NR_capget 90
+__SYSCALL(__NR_capget, sys_capget)
+#define __NR_capset 91
+__SYSCALL(__NR_capset, sys_capset)
+
+/* kernel/exec_domain.c */
+#define __NR_personality 92
+__SYSCALL(__NR_personality, sys_personality)
+
+/* kernel/exit.c */
+#define __NR_exit 93
+__SYSCALL(__NR_exit, sys_exit)
+#define __NR_exit_group 94
+__SYSCALL(__NR_exit_group, sys_exit_group)
+#define __NR_waitid 95
+__SC_COMP(__NR_waitid, sys_waitid, compat_sys_waitid)
+
+/* kernel/fork.c */
+#define __NR_set_tid_address 96
+__SYSCALL(__NR_set_tid_address, sys_set_tid_address)
+#define __NR_unshare 97
+__SYSCALL(__NR_unshare, sys_unshare)
+
+/* kernel/futex.c */
+#define __NR_futex 98
+__SC_COMP(__NR_futex, sys_futex, compat_sys_futex)
+#define __NR_set_robust_list 99
+__SC_COMP(__NR_set_robust_list, sys_set_robust_list, \
+ compat_sys_set_robust_list)
+#define __NR_get_robust_list 100
+__SC_COMP(__NR_get_robust_list, sys_get_robust_list, \
+ compat_sys_get_robust_list)
+
+/* kernel/hrtimer.c */
+#define __NR_nanosleep 101
+__SC_COMP(__NR_nanosleep, sys_nanosleep, compat_sys_nanosleep)
+
+/* kernel/itimer.c */
+#define __NR_getitimer 102
+__SC_COMP(__NR_getitimer, sys_getitimer, compat_sys_getitimer)
+#define __NR_setitimer 103
+__SC_COMP(__NR_setitimer, sys_setitimer, compat_sys_setitimer)
+
+/* kernel/kexec.c */
+#define __NR_kexec_load 104
+__SC_COMP(__NR_kexec_load, sys_kexec_load, compat_sys_kexec_load)
+
+/* kernel/module.c */
+#define __NR_init_module 105
+__SYSCALL(__NR_init_module, sys_init_module)
+#define __NR_delete_module 106
+__SYSCALL(__NR_delete_module, sys_delete_module)
+
+/* kernel/posix-timers.c */
+#define __NR_timer_create 107
+__SC_COMP(__NR_timer_create, sys_timer_create, compat_sys_timer_create)
+#define __NR_timer_gettime 108
+__SC_COMP(__NR_timer_gettime, sys_timer_gettime, compat_sys_timer_gettime)
+#define __NR_timer_getoverrun 109
+__SYSCALL(__NR_timer_getoverrun, sys_timer_getoverrun)
+#define __NR_timer_settime 110
+__SC_COMP(__NR_timer_settime, sys_timer_settime, compat_sys_timer_settime)
+#define __NR_timer_delete 111
+__SYSCALL(__NR_timer_delete, sys_timer_delete)
+#define __NR_clock_settime 112
+__SC_COMP(__NR_clock_settime, sys_clock_settime, compat_sys_clock_settime)
+#define __NR_clock_gettime 113
+__SC_COMP(__NR_clock_gettime, sys_clock_gettime, compat_sys_clock_gettime)
+#define __NR_clock_getres 114
+__SC_COMP(__NR_clock_getres, sys_clock_getres, compat_sys_clock_getres)
+#define __NR_clock_nanosleep 115
+__SC_COMP(__NR_clock_nanosleep, sys_clock_nanosleep, \
+ compat_sys_clock_nanosleep)
+
+/* kernel/printk.c */
+#define __NR_syslog 116
+__SYSCALL(__NR_syslog, sys_syslog)
+
+/* kernel/ptrace.c */
+#define __NR_ptrace 117
+__SYSCALL(__NR_ptrace, sys_ptrace)
+
+/* kernel/sched/core.c */
+#define __NR_sched_setparam 118
+__SYSCALL(__NR_sched_setparam, sys_sched_setparam)
+#define __NR_sched_setscheduler 119
+__SYSCALL(__NR_sched_setscheduler, sys_sched_setscheduler)
+#define __NR_sched_getscheduler 120
+__SYSCALL(__NR_sched_getscheduler, sys_sched_getscheduler)
+#define __NR_sched_getparam 121
+__SYSCALL(__NR_sched_getparam, sys_sched_getparam)
+#define __NR_sched_setaffinity 122
+__SC_COMP(__NR_sched_setaffinity, sys_sched_setaffinity, \
+ compat_sys_sched_setaffinity)
+#define __NR_sched_getaffinity 123
+__SC_COMP(__NR_sched_getaffinity, sys_sched_getaffinity, \
+ compat_sys_sched_getaffinity)
+#define __NR_sched_yield 124
+__SYSCALL(__NR_sched_yield, sys_sched_yield)
+#define __NR_sched_get_priority_max 125
+__SYSCALL(__NR_sched_get_priority_max, sys_sched_get_priority_max)
+#define __NR_sched_get_priority_min 126
+__SYSCALL(__NR_sched_get_priority_min, sys_sched_get_priority_min)
+#define __NR_sched_rr_get_interval 127
+__SC_COMP(__NR_sched_rr_get_interval, sys_sched_rr_get_interval, \
+ compat_sys_sched_rr_get_interval)
+
+/* kernel/signal.c */
+#define __NR_restart_syscall 128
+__SYSCALL(__NR_restart_syscall, sys_restart_syscall)
+#define __NR_kill 129
+__SYSCALL(__NR_kill, sys_kill)
+#define __NR_tkill 130
+__SYSCALL(__NR_tkill, sys_tkill)
+#define __NR_tgkill 131
+__SYSCALL(__NR_tgkill, sys_tgkill)
+#define __NR_sigaltstack 132
+__SC_COMP(__NR_sigaltstack, sys_sigaltstack, compat_sys_sigaltstack)
+#define __NR_rt_sigsuspend 133
+__SC_COMP(__NR_rt_sigsuspend, sys_rt_sigsuspend, compat_sys_rt_sigsuspend)
+#define __NR_rt_sigaction 134
+__SC_COMP(__NR_rt_sigaction, sys_rt_sigaction, compat_sys_rt_sigaction)
+#define __NR_rt_sigprocmask 135
+__SC_COMP(__NR_rt_sigprocmask, sys_rt_sigprocmask, compat_sys_rt_sigprocmask)
+#define __NR_rt_sigpending 136
+__SC_COMP(__NR_rt_sigpending, sys_rt_sigpending, compat_sys_rt_sigpending)
+#define __NR_rt_sigtimedwait 137
+__SC_COMP(__NR_rt_sigtimedwait, sys_rt_sigtimedwait, \
+ compat_sys_rt_sigtimedwait)
+#define __NR_rt_sigqueueinfo 138
+__SC_COMP(__NR_rt_sigqueueinfo, sys_rt_sigqueueinfo, \
+ compat_sys_rt_sigqueueinfo)
+#define __NR_rt_sigreturn 139
+__SC_COMP(__NR_rt_sigreturn, sys_rt_sigreturn, compat_sys_rt_sigreturn)
+
+/* kernel/sys.c */
+#define __NR_setpriority 140
+__SYSCALL(__NR_setpriority, sys_setpriority)
+#define __NR_getpriority 141
+__SYSCALL(__NR_getpriority, sys_getpriority)
+#define __NR_reboot 142
+__SYSCALL(__NR_reboot, sys_reboot)
+#define __NR_setregid 143
+__SYSCALL(__NR_setregid, sys_setregid)
+#define __NR_setgid 144
+__SYSCALL(__NR_setgid, sys_setgid)
+#define __NR_setreuid 145
+__SYSCALL(__NR_setreuid, sys_setreuid)
+#define __NR_setuid 146
+__SYSCALL(__NR_setuid, sys_setuid)
+#define __NR_setresuid 147
+__SYSCALL(__NR_setresuid, sys_setresuid)
+#define __NR_getresuid 148
+__SYSCALL(__NR_getresuid, sys_getresuid)
+#define __NR_setresgid 149
+__SYSCALL(__NR_setresgid, sys_setresgid)
+#define __NR_getresgid 150
+__SYSCALL(__NR_getresgid, sys_getresgid)
+#define __NR_setfsuid 151
+__SYSCALL(__NR_setfsuid, sys_setfsuid)
+#define __NR_setfsgid 152
+__SYSCALL(__NR_setfsgid, sys_setfsgid)
+#define __NR_times 153
+__SC_COMP(__NR_times, sys_times, compat_sys_times)
+#define __NR_setpgid 154
+__SYSCALL(__NR_setpgid, sys_setpgid)
+#define __NR_getpgid 155
+__SYSCALL(__NR_getpgid, sys_getpgid)
+#define __NR_getsid 156
+__SYSCALL(__NR_getsid, sys_getsid)
+#define __NR_setsid 157
+__SYSCALL(__NR_setsid, sys_setsid)
+#define __NR_getgroups 158
+__SYSCALL(__NR_getgroups, sys_getgroups)
+#define __NR_setgroups 159
+__SYSCALL(__NR_setgroups, sys_setgroups)
+#define __NR_uname 160
+__SYSCALL(__NR_uname, sys_newuname)
+#define __NR_sethostname 161
+__SYSCALL(__NR_sethostname, sys_sethostname)
+#define __NR_setdomainname 162
+__SYSCALL(__NR_setdomainname, sys_setdomainname)
+#define __NR_getrlimit 163
+__SC_COMP(__NR_getrlimit, sys_getrlimit, compat_sys_getrlimit)
+#define __NR_setrlimit 164
+__SC_COMP(__NR_setrlimit, sys_setrlimit, compat_sys_setrlimit)
+#define __NR_getrusage 165
+__SC_COMP(__NR_getrusage, sys_getrusage, compat_sys_getrusage)
+#define __NR_umask 166
+__SYSCALL(__NR_umask, sys_umask)
+#define __NR_prctl 167
+__SYSCALL(__NR_prctl, sys_prctl)
+#define __NR_getcpu 168
+__SYSCALL(__NR_getcpu, sys_getcpu)
+
+/* kernel/time.c */
+#define __NR_gettimeofday 169
+__SC_COMP(__NR_gettimeofday, sys_gettimeofday, compat_sys_gettimeofday)
+#define __NR_settimeofday 170
+__SC_COMP(__NR_settimeofday, sys_settimeofday, compat_sys_settimeofday)
+#define __NR_adjtimex 171
+__SC_COMP(__NR_adjtimex, sys_adjtimex, compat_sys_adjtimex)
+
+/* kernel/timer.c */
+#define __NR_getpid 172
+__SYSCALL(__NR_getpid, sys_getpid)
+#define __NR_getppid 173
+__SYSCALL(__NR_getppid, sys_getppid)
+#define __NR_getuid 174
+__SYSCALL(__NR_getuid, sys_getuid)
+#define __NR_geteuid 175
+__SYSCALL(__NR_geteuid, sys_geteuid)
+#define __NR_getgid 176
+__SYSCALL(__NR_getgid, sys_getgid)
+#define __NR_getegid 177
+__SYSCALL(__NR_getegid, sys_getegid)
+#define __NR_gettid 178
+__SYSCALL(__NR_gettid, sys_gettid)
+#define __NR_sysinfo 179
+__SC_COMP(__NR_sysinfo, sys_sysinfo, compat_sys_sysinfo)
+
+/* ipc/mqueue.c */
+#define __NR_mq_open 180
+__SC_COMP(__NR_mq_open, sys_mq_open, compat_sys_mq_open)
+#define __NR_mq_unlink 181
+__SYSCALL(__NR_mq_unlink, sys_mq_unlink)
+#define __NR_mq_timedsend 182
+__SC_COMP(__NR_mq_timedsend, sys_mq_timedsend, compat_sys_mq_timedsend)
+#define __NR_mq_timedreceive 183
+__SC_COMP(__NR_mq_timedreceive, sys_mq_timedreceive, \
+ compat_sys_mq_timedreceive)
+#define __NR_mq_notify 184
+__SC_COMP(__NR_mq_notify, sys_mq_notify, compat_sys_mq_notify)
+#define __NR_mq_getsetattr 185
+__SC_COMP(__NR_mq_getsetattr, sys_mq_getsetattr, compat_sys_mq_getsetattr)
+
+/* ipc/msg.c */
+#define __NR_msgget 186
+__SYSCALL(__NR_msgget, sys_msgget)
+#define __NR_msgctl 187
+__SC_COMP(__NR_msgctl, sys_msgctl, compat_sys_msgctl)
+#define __NR_msgrcv 188
+__SC_COMP(__NR_msgrcv, sys_msgrcv, compat_sys_msgrcv)
+#define __NR_msgsnd 189
+__SC_COMP(__NR_msgsnd, sys_msgsnd, compat_sys_msgsnd)
+
+/* ipc/sem.c */
+#define __NR_semget 190
+__SYSCALL(__NR_semget, sys_semget)
+#define __NR_semctl 191
+__SC_COMP(__NR_semctl, sys_semctl, compat_sys_semctl)
+#define __NR_semtimedop 192
+__SC_COMP(__NR_semtimedop, sys_semtimedop, compat_sys_semtimedop)
+#define __NR_semop 193
+__SYSCALL(__NR_semop, sys_semop)
+
+/* ipc/shm.c */
+#define __NR_shmget 194
+__SYSCALL(__NR_shmget, sys_shmget)
+#define __NR_shmctl 195
+__SC_COMP(__NR_shmctl, sys_shmctl, compat_sys_shmctl)
+#define __NR_shmat 196
+__SC_COMP(__NR_shmat, sys_shmat, compat_sys_shmat)
+#define __NR_shmdt 197
+__SYSCALL(__NR_shmdt, sys_shmdt)
+
+/* net/socket.c */
+#define __NR_socket 198
+__SYSCALL(__NR_socket, sys_socket)
+#define __NR_socketpair 199
+__SYSCALL(__NR_socketpair, sys_socketpair)
+#define __NR_bind 200
+__SYSCALL(__NR_bind, sys_bind)
+#define __NR_listen 201
+__SYSCALL(__NR_listen, sys_listen)
+#define __NR_accept 202
+__SYSCALL(__NR_accept, sys_accept)
+#define __NR_connect 203
+__SYSCALL(__NR_connect, sys_connect)
+#define __NR_getsockname 204
+__SYSCALL(__NR_getsockname, sys_getsockname)
+#define __NR_getpeername 205
+__SYSCALL(__NR_getpeername, sys_getpeername)
+#define __NR_sendto 206
+__SYSCALL(__NR_sendto, sys_sendto)
+#define __NR_recvfrom 207
+__SC_COMP(__NR_recvfrom, sys_recvfrom, compat_sys_recvfrom)
+#define __NR_setsockopt 208
+__SC_COMP(__NR_setsockopt, sys_setsockopt, compat_sys_setsockopt)
+#define __NR_getsockopt 209
+__SC_COMP(__NR_getsockopt, sys_getsockopt, compat_sys_getsockopt)
+#define __NR_shutdown 210
+__SYSCALL(__NR_shutdown, sys_shutdown)
+#define __NR_sendmsg 211
+__SC_COMP(__NR_sendmsg, sys_sendmsg, compat_sys_sendmsg)
+#define __NR_recvmsg 212
+__SC_COMP(__NR_recvmsg, sys_recvmsg, compat_sys_recvmsg)
+
+/* mm/filemap.c */
+#define __NR_readahead 213
+__SC_COMP(__NR_readahead, sys_readahead, compat_sys_readahead)
+
+/* mm/nommu.c, also with MMU */
+#define __NR_brk 214
+__SYSCALL(__NR_brk, sys_brk)
+#define __NR_munmap 215
+__SYSCALL(__NR_munmap, sys_munmap)
+#define __NR_mremap 216
+__SYSCALL(__NR_mremap, sys_mremap)
+
+/* security/keys/keyctl.c */
+#define __NR_add_key 217
+__SYSCALL(__NR_add_key, sys_add_key)
+#define __NR_request_key 218
+__SYSCALL(__NR_request_key, sys_request_key)
+#define __NR_keyctl 219
+__SC_COMP(__NR_keyctl, sys_keyctl, compat_sys_keyctl)
+
+/* arch/example/kernel/sys_example.c */
+#define __NR_clone 220
+__SYSCALL(__NR_clone, sys_clone)
+#define __NR_execve 221
+__SC_COMP(__NR_execve, sys_execve, compat_sys_execve)
+
+#define __NR3264_mmap 222
+__SC_3264(__NR3264_mmap, sys_mmap2, sys_mmap)
+/* mm/fadvise.c */
+#define __NR3264_fadvise64 223
+__SC_COMP(__NR3264_fadvise64, sys_fadvise64_64, compat_sys_fadvise64_64)
+
+/* mm/, CONFIG_MMU only */
+#ifndef __ARCH_NOMMU
+#define __NR_swapon 224
+__SYSCALL(__NR_swapon, sys_swapon)
+#define __NR_swapoff 225
+__SYSCALL(__NR_swapoff, sys_swapoff)
+#define __NR_mprotect 226
+__SYSCALL(__NR_mprotect, sys_mprotect)
+#define __NR_msync 227
+__SYSCALL(__NR_msync, sys_msync)
+#define __NR_mlock 228
+__SYSCALL(__NR_mlock, sys_mlock)
+#define __NR_munlock 229
+__SYSCALL(__NR_munlock, sys_munlock)
+#define __NR_mlockall 230
+__SYSCALL(__NR_mlockall, sys_mlockall)
+#define __NR_munlockall 231
+__SYSCALL(__NR_munlockall, sys_munlockall)
+#define __NR_mincore 232
+__SYSCALL(__NR_mincore, sys_mincore)
+#define __NR_madvise 233
+__SYSCALL(__NR_madvise, sys_madvise)
+#define __NR_remap_file_pages 234
+__SYSCALL(__NR_remap_file_pages, sys_remap_file_pages)
+#define __NR_mbind 235
+__SC_COMP(__NR_mbind, sys_mbind, compat_sys_mbind)
+#define __NR_get_mempolicy 236
+__SC_COMP(__NR_get_mempolicy, sys_get_mempolicy, compat_sys_get_mempolicy)
+#define __NR_set_mempolicy 237
+__SC_COMP(__NR_set_mempolicy, sys_set_mempolicy, compat_sys_set_mempolicy)
+#define __NR_migrate_pages 238
+__SC_COMP(__NR_migrate_pages, sys_migrate_pages, compat_sys_migrate_pages)
+#define __NR_move_pages 239
+__SC_COMP(__NR_move_pages, sys_move_pages, compat_sys_move_pages)
+#endif
+
+#define __NR_rt_tgsigqueueinfo 240
+__SC_COMP(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo, \
+ compat_sys_rt_tgsigqueueinfo)
+#define __NR_perf_event_open 241
+__SYSCALL(__NR_perf_event_open, sys_perf_event_open)
+#define __NR_accept4 242
+__SYSCALL(__NR_accept4, sys_accept4)
+#define __NR_recvmmsg 243
+__SC_COMP(__NR_recvmmsg, sys_recvmmsg, compat_sys_recvmmsg)
+
+/*
+ * Architectures may provide up to 16 syscalls of their own
+ * starting with this value.
+ */
+#define __NR_arch_specific_syscall 244
+
+#define __NR_wait4 260
+__SC_COMP(__NR_wait4, sys_wait4, compat_sys_wait4)
+#define __NR_prlimit64 261
+__SYSCALL(__NR_prlimit64, sys_prlimit64)
+#define __NR_fanotify_init 262
+__SYSCALL(__NR_fanotify_init, sys_fanotify_init)
+#define __NR_fanotify_mark 263
+__SYSCALL(__NR_fanotify_mark, sys_fanotify_mark)
+#define __NR_name_to_handle_at 264
+__SYSCALL(__NR_name_to_handle_at, sys_name_to_handle_at)
+#define __NR_open_by_handle_at 265
+__SC_COMP(__NR_open_by_handle_at, sys_open_by_handle_at, \
+ compat_sys_open_by_handle_at)
+#define __NR_clock_adjtime 266
+__SC_COMP(__NR_clock_adjtime, sys_clock_adjtime, compat_sys_clock_adjtime)
+#define __NR_syncfs 267
+__SYSCALL(__NR_syncfs, sys_syncfs)
+#define __NR_setns 268
+__SYSCALL(__NR_setns, sys_setns)
+#define __NR_sendmmsg 269
+__SC_COMP(__NR_sendmmsg, sys_sendmmsg, compat_sys_sendmmsg)
+#define __NR_process_vm_readv 270
+__SC_COMP(__NR_process_vm_readv, sys_process_vm_readv, \
+ compat_sys_process_vm_readv)
+#define __NR_process_vm_writev 271
+__SC_COMP(__NR_process_vm_writev, sys_process_vm_writev, \
+ compat_sys_process_vm_writev)
+#define __NR_kcmp 272
+__SYSCALL(__NR_kcmp, sys_kcmp)
+#define __NR_finit_module 273
+__SYSCALL(__NR_finit_module, sys_finit_module)
+#define __NR_sched_setattr 274
+__SYSCALL(__NR_sched_setattr, sys_sched_setattr)
+#define __NR_sched_getattr 275
+__SYSCALL(__NR_sched_getattr, sys_sched_getattr)
+#define __NR_renameat2 276
+__SYSCALL(__NR_renameat2, sys_renameat2)
+#define __NR_seccomp 277
+__SYSCALL(__NR_seccomp, sys_seccomp)
+#define __NR_getrandom 278
+__SYSCALL(__NR_getrandom, sys_getrandom)
+#define __NR_memfd_create 279
+__SYSCALL(__NR_memfd_create, sys_memfd_create)
+#define __NR_bpf 280
+__SYSCALL(__NR_bpf, sys_bpf)
+#define __NR_execveat 281
+__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_userfaultfd 282
+__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
+#define __NR_membarrier 283
+__SYSCALL(__NR_membarrier, sys_membarrier)
+#define __NR_mlock2 284
+__SYSCALL(__NR_mlock2, sys_mlock2)
+#define __NR_copy_file_range 285
+__SYSCALL(__NR_copy_file_range, sys_copy_file_range)
+#define __NR_preadv2 286
+__SC_COMP(__NR_preadv2, sys_preadv2, compat_sys_preadv2)
+#define __NR_pwritev2 287
+__SC_COMP(__NR_pwritev2, sys_pwritev2, compat_sys_pwritev2)
+#define __NR_pkey_mprotect 288
+__SYSCALL(__NR_pkey_mprotect, sys_pkey_mprotect)
+#define __NR_pkey_alloc 289
+__SYSCALL(__NR_pkey_alloc, sys_pkey_alloc)
+#define __NR_pkey_free 290
+__SYSCALL(__NR_pkey_free, sys_pkey_free)
+#define __NR_statx 291
+__SYSCALL(__NR_statx, sys_statx)
+#define __NR_io_pgetevents 292
+__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
+
+#undef __NR_syscalls
+#define __NR_syscalls 293
+
+/*
+ * 32 bit systems traditionally used different
+ * syscalls for off_t and loff_t arguments, while
+ * 64 bit systems only need the off_t version.
+ * For new 32 bit platforms, there is no need to
+ * implement the old 32 bit off_t syscalls, so
+ * they take different names.
+ * Here we map the numbers so that both versions
+ * use the same syscall table layout.
+ */
+#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
+#define __NR_fcntl __NR3264_fcntl
+#define __NR_statfs __NR3264_statfs
+#define __NR_fstatfs __NR3264_fstatfs
+#define __NR_truncate __NR3264_truncate
+#define __NR_ftruncate __NR3264_ftruncate
+#define __NR_lseek __NR3264_lseek
+#define __NR_sendfile __NR3264_sendfile
+#define __NR_newfstatat __NR3264_fstatat
+#define __NR_fstat __NR3264_fstat
+#define __NR_mmap __NR3264_mmap
+#define __NR_fadvise64 __NR3264_fadvise64
+#ifdef __NR3264_stat
+#define __NR_stat __NR3264_stat
+#define __NR_lstat __NR3264_lstat
+#endif
+#else
+#define __NR_fcntl64 __NR3264_fcntl
+#define __NR_statfs64 __NR3264_statfs
+#define __NR_fstatfs64 __NR3264_fstatfs
+#define __NR_truncate64 __NR3264_truncate
+#define __NR_ftruncate64 __NR3264_ftruncate
+#define __NR_llseek __NR3264_lseek
+#define __NR_sendfile64 __NR3264_sendfile
+#define __NR_fstatat64 __NR3264_fstatat
+#define __NR_fstat64 __NR3264_fstat
+#define __NR_mmap2 __NR3264_mmap
+#define __NR_fadvise64_64 __NR3264_fadvise64
+#ifdef __NR3264_stat
+#define __NR_stat64 __NR3264_stat
+#define __NR_lstat64 __NR3264_lstat
+#endif
+#endif
diff --git a/tools/include/uapi/linux/in.h b/tools/include/uapi/linux/in.h
new file mode 100644
index 0000000..48e8a225b
--- /dev/null
+++ b/tools/include/uapi/linux/in.h
@@ -0,0 +1,301 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * INET An implementation of the TCP/IP protocol suite for the LINUX
+ * operating system. INET is implemented using the BSD Socket
+ * interface as the means of communication with the user level.
+ *
+ * Definitions of the Internet Protocol.
+ *
+ * Version: @(#)in.h 1.0.1 04/21/93
+ *
+ * Authors: Original taken from the GNU Project <netinet/in.h> file.
+ * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _UAPI_LINUX_IN_H
+#define _UAPI_LINUX_IN_H
+
+#include <linux/types.h>
+#include <linux/libc-compat.h>
+#include <linux/socket.h>
+
+#if __UAPI_DEF_IN_IPPROTO
+/* Standard well-defined IP protocols. */
+enum {
+ IPPROTO_IP = 0, /* Dummy protocol for TCP */
+#define IPPROTO_IP IPPROTO_IP
+ IPPROTO_ICMP = 1, /* Internet Control Message Protocol */
+#define IPPROTO_ICMP IPPROTO_ICMP
+ IPPROTO_IGMP = 2, /* Internet Group Management Protocol */
+#define IPPROTO_IGMP IPPROTO_IGMP
+ IPPROTO_IPIP = 4, /* IPIP tunnels (older KA9Q tunnels use 94) */
+#define IPPROTO_IPIP IPPROTO_IPIP
+ IPPROTO_TCP = 6, /* Transmission Control Protocol */
+#define IPPROTO_TCP IPPROTO_TCP
+ IPPROTO_EGP = 8, /* Exterior Gateway Protocol */
+#define IPPROTO_EGP IPPROTO_EGP
+ IPPROTO_PUP = 12, /* PUP protocol */
+#define IPPROTO_PUP IPPROTO_PUP
+ IPPROTO_UDP = 17, /* User Datagram Protocol */
+#define IPPROTO_UDP IPPROTO_UDP
+ IPPROTO_IDP = 22, /* XNS IDP protocol */
+#define IPPROTO_IDP IPPROTO_IDP
+ IPPROTO_TP = 29, /* SO Transport Protocol Class 4 */
+#define IPPROTO_TP IPPROTO_TP
+ IPPROTO_DCCP = 33, /* Datagram Congestion Control Protocol */
+#define IPPROTO_DCCP IPPROTO_DCCP
+ IPPROTO_IPV6 = 41, /* IPv6-in-IPv4 tunnelling */
+#define IPPROTO_IPV6 IPPROTO_IPV6
+ IPPROTO_RSVP = 46, /* RSVP Protocol */
+#define IPPROTO_RSVP IPPROTO_RSVP
+ IPPROTO_GRE = 47, /* Cisco GRE tunnels (rfc 1701,1702) */
+#define IPPROTO_GRE IPPROTO_GRE
+ IPPROTO_ESP = 50, /* Encapsulation Security Payload protocol */
+#define IPPROTO_ESP IPPROTO_ESP
+ IPPROTO_AH = 51, /* Authentication Header protocol */
+#define IPPROTO_AH IPPROTO_AH
+ IPPROTO_MTP = 92, /* Multicast Transport Protocol */
+#define IPPROTO_MTP IPPROTO_MTP
+ IPPROTO_BEETPH = 94, /* IP option pseudo header for BEET */
+#define IPPROTO_BEETPH IPPROTO_BEETPH
+ IPPROTO_ENCAP = 98, /* Encapsulation Header */
+#define IPPROTO_ENCAP IPPROTO_ENCAP
+ IPPROTO_PIM = 103, /* Protocol Independent Multicast */
+#define IPPROTO_PIM IPPROTO_PIM
+ IPPROTO_COMP = 108, /* Compression Header Protocol */
+#define IPPROTO_COMP IPPROTO_COMP
+ IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+#define IPPROTO_SCTP IPPROTO_SCTP
+ IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */
+#define IPPROTO_UDPLITE IPPROTO_UDPLITE
+ IPPROTO_MPLS = 137, /* MPLS in IP (RFC 4023) */
+#define IPPROTO_MPLS IPPROTO_MPLS
+ IPPROTO_RAW = 255, /* Raw IP packets */
+#define IPPROTO_RAW IPPROTO_RAW
+ IPPROTO_MAX
+};
+#endif
+
+#if __UAPI_DEF_IN_ADDR
+/* Internet address. */
+struct in_addr {
+ __be32 s_addr;
+};
+#endif
+
+#define IP_TOS 1
+#define IP_TTL 2
+#define IP_HDRINCL 3
+#define IP_OPTIONS 4
+#define IP_ROUTER_ALERT 5
+#define IP_RECVOPTS 6
+#define IP_RETOPTS 7
+#define IP_PKTINFO 8
+#define IP_PKTOPTIONS 9
+#define IP_MTU_DISCOVER 10
+#define IP_RECVERR 11
+#define IP_RECVTTL 12
+#define IP_RECVTOS 13
+#define IP_MTU 14
+#define IP_FREEBIND 15
+#define IP_IPSEC_POLICY 16
+#define IP_XFRM_POLICY 17
+#define IP_PASSSEC 18
+#define IP_TRANSPARENT 19
+
+/* BSD compatibility */
+#define IP_RECVRETOPTS IP_RETOPTS
+
+/* TProxy original addresses */
+#define IP_ORIGDSTADDR 20
+#define IP_RECVORIGDSTADDR IP_ORIGDSTADDR
+
+#define IP_MINTTL 21
+#define IP_NODEFRAG 22
+#define IP_CHECKSUM 23
+#define IP_BIND_ADDRESS_NO_PORT 24
+#define IP_RECVFRAGSIZE 25
+
+/* IP_MTU_DISCOVER values */
+#define IP_PMTUDISC_DONT 0 /* Never send DF frames */
+#define IP_PMTUDISC_WANT 1 /* Use per route hints */
+#define IP_PMTUDISC_DO 2 /* Always DF */
+#define IP_PMTUDISC_PROBE 3 /* Ignore dst pmtu */
+/* Always use interface mtu (ignores dst pmtu) but don't set DF flag.
+ * Also incoming ICMP frag_needed notifications will be ignored on
+ * this socket to prevent accepting spoofed ones.
+ */
+#define IP_PMTUDISC_INTERFACE 4
+/* weaker version of IP_PMTUDISC_INTERFACE, which allos packets to get
+ * fragmented if they exeed the interface mtu
+ */
+#define IP_PMTUDISC_OMIT 5
+
+#define IP_MULTICAST_IF 32
+#define IP_MULTICAST_TTL 33
+#define IP_MULTICAST_LOOP 34
+#define IP_ADD_MEMBERSHIP 35
+#define IP_DROP_MEMBERSHIP 36
+#define IP_UNBLOCK_SOURCE 37
+#define IP_BLOCK_SOURCE 38
+#define IP_ADD_SOURCE_MEMBERSHIP 39
+#define IP_DROP_SOURCE_MEMBERSHIP 40
+#define IP_MSFILTER 41
+#define MCAST_JOIN_GROUP 42
+#define MCAST_BLOCK_SOURCE 43
+#define MCAST_UNBLOCK_SOURCE 44
+#define MCAST_LEAVE_GROUP 45
+#define MCAST_JOIN_SOURCE_GROUP 46
+#define MCAST_LEAVE_SOURCE_GROUP 47
+#define MCAST_MSFILTER 48
+#define IP_MULTICAST_ALL 49
+#define IP_UNICAST_IF 50
+
+#define MCAST_EXCLUDE 0
+#define MCAST_INCLUDE 1
+
+/* These need to appear somewhere around here */
+#define IP_DEFAULT_MULTICAST_TTL 1
+#define IP_DEFAULT_MULTICAST_LOOP 1
+
+/* Request struct for multicast socket ops */
+
+#if __UAPI_DEF_IP_MREQ
+struct ip_mreq {
+ struct in_addr imr_multiaddr; /* IP multicast address of group */
+ struct in_addr imr_interface; /* local IP address of interface */
+};
+
+struct ip_mreqn {
+ struct in_addr imr_multiaddr; /* IP multicast address of group */
+ struct in_addr imr_address; /* local IP address of interface */
+ int imr_ifindex; /* Interface index */
+};
+
+struct ip_mreq_source {
+ __be32 imr_multiaddr;
+ __be32 imr_interface;
+ __be32 imr_sourceaddr;
+};
+
+struct ip_msfilter {
+ __be32 imsf_multiaddr;
+ __be32 imsf_interface;
+ __u32 imsf_fmode;
+ __u32 imsf_numsrc;
+ __be32 imsf_slist[1];
+};
+
+#define IP_MSFILTER_SIZE(numsrc) \
+ (sizeof(struct ip_msfilter) - sizeof(__u32) \
+ + (numsrc) * sizeof(__u32))
+
+struct group_req {
+ __u32 gr_interface; /* interface index */
+ struct __kernel_sockaddr_storage gr_group; /* group address */
+};
+
+struct group_source_req {
+ __u32 gsr_interface; /* interface index */
+ struct __kernel_sockaddr_storage gsr_group; /* group address */
+ struct __kernel_sockaddr_storage gsr_source; /* source address */
+};
+
+struct group_filter {
+ __u32 gf_interface; /* interface index */
+ struct __kernel_sockaddr_storage gf_group; /* multicast address */
+ __u32 gf_fmode; /* filter mode */
+ __u32 gf_numsrc; /* number of sources */
+ struct __kernel_sockaddr_storage gf_slist[1]; /* interface index */
+};
+
+#define GROUP_FILTER_SIZE(numsrc) \
+ (sizeof(struct group_filter) - sizeof(struct __kernel_sockaddr_storage) \
+ + (numsrc) * sizeof(struct __kernel_sockaddr_storage))
+#endif
+
+#if __UAPI_DEF_IN_PKTINFO
+struct in_pktinfo {
+ int ipi_ifindex;
+ struct in_addr ipi_spec_dst;
+ struct in_addr ipi_addr;
+};
+#endif
+
+/* Structure describing an Internet (IP) socket address. */
+#if __UAPI_DEF_SOCKADDR_IN
+#define __SOCK_SIZE__ 16 /* sizeof(struct sockaddr) */
+struct sockaddr_in {
+ __kernel_sa_family_t sin_family; /* Address family */
+ __be16 sin_port; /* Port number */
+ struct in_addr sin_addr; /* Internet address */
+
+ /* Pad to size of `struct sockaddr'. */
+ unsigned char __pad[__SOCK_SIZE__ - sizeof(short int) -
+ sizeof(unsigned short int) - sizeof(struct in_addr)];
+};
+#define sin_zero __pad /* for BSD UNIX comp. -FvK */
+#endif
+
+#if __UAPI_DEF_IN_CLASS
+/*
+ * Definitions of the bits in an Internet address integer.
+ * On subnets, host and network parts are found according
+ * to the subnet mask, not these masks.
+ */
+#define IN_CLASSA(a) ((((long int) (a)) & 0x80000000) == 0)
+#define IN_CLASSA_NET 0xff000000
+#define IN_CLASSA_NSHIFT 24
+#define IN_CLASSA_HOST (0xffffffff & ~IN_CLASSA_NET)
+#define IN_CLASSA_MAX 128
+
+#define IN_CLASSB(a) ((((long int) (a)) & 0xc0000000) == 0x80000000)
+#define IN_CLASSB_NET 0xffff0000
+#define IN_CLASSB_NSHIFT 16
+#define IN_CLASSB_HOST (0xffffffff & ~IN_CLASSB_NET)
+#define IN_CLASSB_MAX 65536
+
+#define IN_CLASSC(a) ((((long int) (a)) & 0xe0000000) == 0xc0000000)
+#define IN_CLASSC_NET 0xffffff00
+#define IN_CLASSC_NSHIFT 8
+#define IN_CLASSC_HOST (0xffffffff & ~IN_CLASSC_NET)
+
+#define IN_CLASSD(a) ((((long int) (a)) & 0xf0000000) == 0xe0000000)
+#define IN_MULTICAST(a) IN_CLASSD(a)
+#define IN_MULTICAST_NET 0xF0000000
+
+#define IN_EXPERIMENTAL(a) ((((long int) (a)) & 0xf0000000) == 0xf0000000)
+#define IN_BADCLASS(a) IN_EXPERIMENTAL((a))
+
+/* Address to accept any incoming messages. */
+#define INADDR_ANY ((unsigned long int) 0x00000000)
+
+/* Address to send to all hosts. */
+#define INADDR_BROADCAST ((unsigned long int) 0xffffffff)
+
+/* Address indicating an error return. */
+#define INADDR_NONE ((unsigned long int) 0xffffffff)
+
+/* Network number for local host loopback. */
+#define IN_LOOPBACKNET 127
+
+/* Address to loopback in software to local host. */
+#define INADDR_LOOPBACK 0x7f000001 /* 127.0.0.1 */
+#define IN_LOOPBACK(a) ((((long int) (a)) & 0xff000000) == 0x7f000000)
+
+/* Defines for Multicast INADDR */
+#define INADDR_UNSPEC_GROUP 0xe0000000U /* 224.0.0.0 */
+#define INADDR_ALLHOSTS_GROUP 0xe0000001U /* 224.0.0.1 */
+#define INADDR_ALLRTRS_GROUP 0xe0000002U /* 224.0.0.2 */
+#define INADDR_MAX_LOCAL_GROUP 0xe00000ffU /* 224.0.0.255 */
+#endif
+
+/* <asm/byteorder.h> contains the htonl type stuff.. */
+#include <asm/byteorder.h>
+
+
+#endif /* _UAPI_LINUX_IN_H */
diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index 1b09f31..0cbd1ef8 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -804,7 +804,7 @@
Second, some types of fence affect the way the memory subsystem
propagates stores. When a fence instruction is executed on CPU C:
- For each other CPU C', smb_wmb() forces all po-earlier stores
+ For each other CPU C', smp_wmb() forces all po-earlier stores
on C to propagate to C' before any po-later stores do.
For each other CPU C', any store which propagates to C before
diff --git a/tools/memory-model/Documentation/recipes.txt b/tools/memory-model/Documentation/recipes.txt
index ee4309a..af72700 100644
--- a/tools/memory-model/Documentation/recipes.txt
+++ b/tools/memory-model/Documentation/recipes.txt
@@ -126,7 +126,7 @@
locking will be seen as ordered by CPUs not holding that lock.
Consider this example:
- /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
+ /* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */
void CPU0(void)
{
spin_lock(&mylock);
@@ -292,7 +292,7 @@
smp_wmb() and smp_rmb() APIs are still heavily used, so it is important
to understand their use cases. The general approach is shown below:
- /* See MP+wmbonceonce+rmbonceonce.litmus. */
+ /* See MP+fencewmbonceonce+fencermbonceonce.litmus. */
void CPU0(void)
{
WRITE_ONCE(x, 1);
@@ -322,9 +322,9 @@
And the xlog_valid_lsn() function in fs/xfs/xfs_log_priv.h contains
the corresponding read-side code fragment:
- cur_cycle = ACCESS_ONCE(log->l_curr_cycle);
+ cur_cycle = READ_ONCE(log->l_curr_cycle);
smp_rmb();
- cur_block = ACCESS_ONCE(log->l_curr_block);
+ cur_block = READ_ONCE(log->l_curr_block);
Alternatively, consider the following comment in function
perf_output_put_handle() in kernel/events/ring_buffer.c:
@@ -360,7 +360,7 @@
One way of avoiding the counter-intuitive outcome is through the use of a
control dependency paired with a full memory barrier:
- /* See LB+ctrlonceonce+mbonceonce.litmus. */
+ /* See LB+fencembonceonce+ctrlonceonce.litmus. */
void CPU0(void)
{
r0 = READ_ONCE(x);
@@ -476,7 +476,7 @@
while another CPU stores to the second variable and then loads from the
first. Preserving order requires nothing less than full barriers:
- /* See SB+mbonceonces.litmus. */
+ /* See SB+fencembonceonces.litmus. */
void CPU0(void)
{
WRITE_ONCE(x, 1);
diff --git a/tools/memory-model/README b/tools/memory-model/README
index 734f7fe..ee987ce2 100644
--- a/tools/memory-model/README
+++ b/tools/memory-model/README
@@ -35,13 +35,13 @@
The memory model is used, in conjunction with "herd7", to exhaustively
explore the state space of small litmus tests.
-For example, to run SB+mbonceonces.litmus against the memory model:
+For example, to run SB+fencembonceonces.litmus against the memory model:
- $ herd7 -conf linux-kernel.cfg litmus-tests/SB+mbonceonces.litmus
+ $ herd7 -conf linux-kernel.cfg litmus-tests/SB+fencembonceonces.litmus
Here is the corresponding output:
- Test SB+mbonceonces Allowed
+ Test SB+fencembonceonces Allowed
States 3
0:r0=0; 1:r0=1;
0:r0=1; 1:r0=0;
@@ -50,8 +50,8 @@
Witnesses
Positive: 0 Negative: 3
Condition exists (0:r0=0 /\ 1:r0=0)
- Observation SB+mbonceonces Never 0 3
- Time SB+mbonceonces 0.01
+ Observation SB+fencembonceonces Never 0 3
+ Time SB+fencembonceonces 0.01
Hash=d66d99523e2cac6b06e66f4c995ebb48
The "Positive: 0 Negative: 3" and the "Never 0 3" each indicate that
@@ -67,16 +67,16 @@
The "klitmus7" tool converts a litmus test into a Linux kernel module,
which may then be loaded and run.
-For example, to run SB+mbonceonces.litmus against hardware:
+For example, to run SB+fencembonceonces.litmus against hardware:
$ mkdir mymodules
- $ klitmus7 -o mymodules litmus-tests/SB+mbonceonces.litmus
+ $ klitmus7 -o mymodules litmus-tests/SB+fencembonceonces.litmus
$ cd mymodules ; make
$ sudo sh run.sh
The corresponding output includes:
- Test SB+mbonceonces Allowed
+ Test SB+fencembonceonces Allowed
Histogram (3 states)
644580 :>0:r0=1; 1:r0=0;
644328 :>0:r0=0; 1:r0=1;
@@ -86,8 +86,8 @@
Positive: 0, Negative: 2000000
Condition exists (0:r0=0 /\ 1:r0=0) is NOT validated
Hash=d66d99523e2cac6b06e66f4c995ebb48
- Observation SB+mbonceonces Never 0 2000000
- Time SB+mbonceonces 0.16
+ Observation SB+fencembonceonces Never 0 2000000
+ Time SB+fencembonceonces 0.16
The "Positive: 0 Negative: 2000000" and the "Never 0 2000000" indicate
that during two million trials, the state specified in this litmus
diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell
index 64f5740..b84fb2f 100644
--- a/tools/memory-model/linux-kernel.bell
+++ b/tools/memory-model/linux-kernel.bell
@@ -13,7 +13,7 @@
"Linux-kernel memory consistency model"
-enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) ||
+enum Accesses = 'once (*READ_ONCE,WRITE_ONCE*) ||
'release (*smp_store_release*) ||
'acquire (*smp_load_acquire*) ||
'noreturn (* R of non-return RMW *)
diff --git a/tools/memory-model/litmus-tests/IRIW+mbonceonces+OnceOnce.litmus b/tools/memory-model/litmus-tests/IRIW+fencembonceonces+OnceOnce.litmus
similarity index 95%
rename from tools/memory-model/litmus-tests/IRIW+mbonceonces+OnceOnce.litmus
rename to tools/memory-model/litmus-tests/IRIW+fencembonceonces+OnceOnce.litmus
index 98a3716..e729d27 100644
--- a/tools/memory-model/litmus-tests/IRIW+mbonceonces+OnceOnce.litmus
+++ b/tools/memory-model/litmus-tests/IRIW+fencembonceonces+OnceOnce.litmus
@@ -1,4 +1,4 @@
-C IRIW+mbonceonces+OnceOnce
+C IRIW+fencembonceonces+OnceOnce
(*
* Result: Never
diff --git a/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus b/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus
index 7a39a0a..0f749e4 100644
--- a/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus
+++ b/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus
@@ -1,4 +1,4 @@
-C ISA2+pooncelock+pooncelock+pombonce.litmus
+C ISA2+pooncelock+pooncelock+pombonce
(*
* Result: Sometimes
diff --git a/tools/memory-model/litmus-tests/LB+ctrlonceonce+mbonceonce.litmus b/tools/memory-model/litmus-tests/LB+fencembonceonce+ctrlonceonce.litmus
similarity index 94%
rename from tools/memory-model/litmus-tests/LB+ctrlonceonce+mbonceonce.litmus
rename to tools/memory-model/litmus-tests/LB+fencembonceonce+ctrlonceonce.litmus
index de67082..4727f5a 100644
--- a/tools/memory-model/litmus-tests/LB+ctrlonceonce+mbonceonce.litmus
+++ b/tools/memory-model/litmus-tests/LB+fencembonceonce+ctrlonceonce.litmus
@@ -1,4 +1,4 @@
-C LB+ctrlonceonce+mbonceonce
+C LB+fencembonceonce+ctrlonceonce
(*
* Result: Never
diff --git a/tools/memory-model/litmus-tests/MP+wmbonceonce+rmbonceonce.litmus b/tools/memory-model/litmus-tests/MP+fencewmbonceonce+fencermbonceonce.litmus
similarity index 91%
rename from tools/memory-model/litmus-tests/MP+wmbonceonce+rmbonceonce.litmus
rename to tools/memory-model/litmus-tests/MP+fencewmbonceonce+fencermbonceonce.litmus
index c078f38..a273da9 100644
--- a/tools/memory-model/litmus-tests/MP+wmbonceonce+rmbonceonce.litmus
+++ b/tools/memory-model/litmus-tests/MP+fencewmbonceonce+fencermbonceonce.litmus
@@ -1,4 +1,4 @@
-C MP+wmbonceonce+rmbonceonce
+C MP+fencewmbonceonce+fencermbonceonce
(*
* Result: Never
diff --git a/tools/memory-model/litmus-tests/R+mbonceonces.litmus b/tools/memory-model/litmus-tests/R+fencembonceonces.litmus
similarity index 95%
rename from tools/memory-model/litmus-tests/R+mbonceonces.litmus
rename to tools/memory-model/litmus-tests/R+fencembonceonces.litmus
index a0e884a..222a0b8 100644
--- a/tools/memory-model/litmus-tests/R+mbonceonces.litmus
+++ b/tools/memory-model/litmus-tests/R+fencembonceonces.litmus
@@ -1,4 +1,4 @@
-C R+mbonceonces
+C R+fencembonceonces
(*
* Result: Never
diff --git a/tools/memory-model/litmus-tests/README b/tools/memory-model/litmus-tests/README
index 17eb9a8..4581ec2 100644
--- a/tools/memory-model/litmus-tests/README
+++ b/tools/memory-model/litmus-tests/README
@@ -18,7 +18,7 @@
Test of write-write coherence, that is, whether or not two
successive writes to the same variable are ordered.
-IRIW+mbonceonces+OnceOnce.litmus
+IRIW+fencembonceonces+OnceOnce.litmus
Test of independent reads from independent writes with smp_mb()
between each pairs of reads. In other words, is smp_mb()
sufficient to cause two different reading processes to agree on
@@ -47,7 +47,7 @@
Can a release-acquire chain order a prior store against
a later load?
-LB+ctrlonceonce+mbonceonce.litmus
+LB+fencembonceonce+ctrlonceonce.litmus
Does a control dependency and an smp_mb() suffice for the
load-buffering litmus test, where each process reads from one
of two variables then writes to the other?
@@ -88,14 +88,14 @@
As below, but with the first access of the writer process
and the second access of reader process protected by a lock.
-MP+wmbonceonce+rmbonceonce.litmus
+MP+fencewmbonceonce+fencermbonceonce.litmus
Does a smp_wmb() (between the stores) and an smp_rmb() (between
the loads) suffice for the message-passing litmus test, where one
process writes data and then a flag, and the other process reads
the flag and then the data. (This is similar to the ISA2 tests,
but with two processes instead of three.)
-R+mbonceonces.litmus
+R+fencembonceonces.litmus
This is the fully ordered (via smp_mb()) version of one of
the classic counterintuitive litmus tests that illustrates the
effects of store propagation delays.
@@ -103,7 +103,7 @@
R+poonceonces.litmus
As above, but without the smp_mb() invocations.
-SB+mbonceonces.litmus
+SB+fencembonceonces.litmus
This is the fully ordered (again, via smp_mb() version of store
buffering, which forms the core of Dekker's mutual-exclusion
algorithm.
@@ -111,15 +111,24 @@
SB+poonceonces.litmus
As above, but without the smp_mb() invocations.
+SB+rfionceonce-poonceonces.litmus
+ This litmus test demonstrates that LKMM is not fully multicopy
+ atomic. (Neither is it other multicopy atomic.) This litmus test
+ also demonstrates the "locations" debugging aid, which designates
+ additional registers and locations to be printed out in the dump
+ of final states in the herd7 output. Without the "locations"
+ statement, only those registers and locations mentioned in the
+ "exists" clause will be printed.
+
S+poonceonces.litmus
As below, but without the smp_wmb() and acquire load.
-S+wmbonceonce+poacquireonce.litmus
+S+fencewmbonceonce+poacquireonce.litmus
Can a smp_wmb(), instead of a release, and an acquire order
a prior store against a subsequent store?
WRC+poonceonces+Once.litmus
-WRC+pooncerelease+rmbonceonce+Once.litmus
+WRC+pooncerelease+fencermbonceonce+Once.litmus
These two are members of an extension of the MP litmus-test
class in which the first write is moved to a separate process.
The second is forbidden because smp_store_release() is
@@ -134,7 +143,7 @@
As above, but with smp_mb__after_spinlock() immediately
following the spin_lock().
-Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus
+Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus
Is the ordering provided by a release-acquire chain sufficient
to make ordering apparent to accesses by a process that does
not participate in that release-acquire chain?
diff --git a/tools/memory-model/litmus-tests/S+wmbonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/S+fencewmbonceonce+poacquireonce.litmus
similarity index 90%
rename from tools/memory-model/litmus-tests/S+wmbonceonce+poacquireonce.litmus
rename to tools/memory-model/litmus-tests/S+fencewmbonceonce+poacquireonce.litmus
index c533502..1847982 100644
--- a/tools/memory-model/litmus-tests/S+wmbonceonce+poacquireonce.litmus
+++ b/tools/memory-model/litmus-tests/S+fencewmbonceonce+poacquireonce.litmus
@@ -1,4 +1,4 @@
-C S+wmbonceonce+poacquireonce
+C S+fencewmbonceonce+poacquireonce
(*
* Result: Never
diff --git a/tools/memory-model/litmus-tests/SB+mbonceonces.litmus b/tools/memory-model/litmus-tests/SB+fencembonceonces.litmus
similarity index 95%
rename from tools/memory-model/litmus-tests/SB+mbonceonces.litmus
rename to tools/memory-model/litmus-tests/SB+fencembonceonces.litmus
index 74b874f..ed5fff1 100644
--- a/tools/memory-model/litmus-tests/SB+mbonceonces.litmus
+++ b/tools/memory-model/litmus-tests/SB+fencembonceonces.litmus
@@ -1,4 +1,4 @@
-C SB+mbonceonces
+C SB+fencembonceonces
(*
* Result: Never
diff --git a/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus b/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus
new file mode 100644
index 0000000..04a1660
--- /dev/null
+++ b/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus
@@ -0,0 +1,32 @@
+C SB+rfionceonce-poonceonces
+
+(*
+ * Result: Sometimes
+ *
+ * This litmus test demonstrates that LKMM is not fully multicopy atomic.
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ int r1;
+ int r2;
+
+ WRITE_ONCE(*x, 1);
+ r1 = READ_ONCE(*x);
+ r2 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y)
+{
+ int r3;
+ int r4;
+
+ WRITE_ONCE(*y, 1);
+ r3 = READ_ONCE(*y);
+ r4 = READ_ONCE(*x);
+}
+
+locations [0:r1; 1:r3; x; y] (* Debug aid: Print things not in "exists". *)
+exists (0:r2=0 /\ 1:r4=0)
diff --git a/tools/memory-model/litmus-tests/WRC+pooncerelease+rmbonceonce+Once.litmus b/tools/memory-model/litmus-tests/WRC+pooncerelease+fencermbonceonce+Once.litmus
similarity index 93%
rename from tools/memory-model/litmus-tests/WRC+pooncerelease+rmbonceonce+Once.litmus
rename to tools/memory-model/litmus-tests/WRC+pooncerelease+fencermbonceonce+Once.litmus
index ad3448b..e994725 100644
--- a/tools/memory-model/litmus-tests/WRC+pooncerelease+rmbonceonce+Once.litmus
+++ b/tools/memory-model/litmus-tests/WRC+pooncerelease+fencermbonceonce+Once.litmus
@@ -1,4 +1,4 @@
-C WRC+pooncerelease+rmbonceonce+Once
+C WRC+pooncerelease+fencermbonceonce+Once
(*
* Result: Never
diff --git a/tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus b/tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus
similarity index 94%
rename from tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus
rename to tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus
index a20fc3f..88e70b8 100644
--- a/tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus
+++ b/tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus
@@ -1,4 +1,4 @@
-C Z6.0+pooncerelease+poacquirerelease+mbonceonce
+C Z6.0+pooncerelease+poacquirerelease+fencembonceonce
(*
* Result: Sometimes
diff --git a/tools/memory-model/scripts/checkalllitmus.sh b/tools/memory-model/scripts/checkalllitmus.sh
old mode 100644
new mode 100755
index af0aa15..ca528f9
--- a/tools/memory-model/scripts/checkalllitmus.sh
+++ b/tools/memory-model/scripts/checkalllitmus.sh
@@ -9,7 +9,7 @@
# appended.
#
# Usage:
-# sh checkalllitmus.sh [ directory ]
+# checkalllitmus.sh [ directory ]
#
# The LINUX_HERD_OPTIONS environment variable may be used to specify
# arguments to herd, whose default is defined by the checklitmus.sh script.
diff --git a/tools/memory-model/scripts/checklitmus.sh b/tools/memory-model/scripts/checklitmus.sh
old mode 100644
new mode 100755
index e2e4774..bf12a75
--- a/tools/memory-model/scripts/checklitmus.sh
+++ b/tools/memory-model/scripts/checklitmus.sh
@@ -8,7 +8,7 @@
# with ".out" appended.
#
# Usage:
-# sh checklitmus.sh file.litmus
+# checklitmus.sh file.litmus
#
# The LINUX_HERD_OPTIONS environment variable may be used to specify
# arguments to herd, which default to "-conf linux-kernel.cfg". Thus,
diff --git a/tools/objtool/arch/x86/include/asm/orc_types.h b/tools/objtool/arch/x86/include/asm/orc_types.h
index 9c9dc57..46f516d 100644
--- a/tools/objtool/arch/x86/include/asm/orc_types.h
+++ b/tools/objtool/arch/x86/include/asm/orc_types.h
@@ -88,6 +88,7 @@
unsigned sp_reg:4;
unsigned bp_reg:4;
unsigned type:2;
+ unsigned end:1;
} __packed;
/*
@@ -101,6 +102,7 @@
s16 sp_offset;
u8 sp_reg;
u8 type;
+ u8 end;
};
#endif /* __ASSEMBLY__ */
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index f4a25bd..2928939 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1157,6 +1157,7 @@
cfa->offset = hint->sp_offset;
insn->state.type = hint->type;
+ insn->state.end = hint->end;
}
return 0;
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index c6b68fc..95700a2 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -31,7 +31,7 @@
int stack_size;
unsigned char type;
bool bp_scratch;
- bool drap;
+ bool drap, end;
int drap_reg, drap_offset;
struct cfi_reg vals[CFI_NUM_REGS];
};
diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c
index c334382..faa4442 100644
--- a/tools/objtool/orc_dump.c
+++ b/tools/objtool/orc_dump.c
@@ -203,7 +203,8 @@
print_reg(orc[i].bp_reg, orc[i].bp_offset);
- printf(" type:%s\n", orc_type_name(orc[i].type));
+ printf(" type:%s end:%d\n",
+ orc_type_name(orc[i].type), orc[i].end);
}
elf_end(elf);
diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c
index 18384d9..3f98dcf 100644
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -31,6 +31,8 @@
struct cfi_reg *cfa = &insn->state.cfa;
struct cfi_reg *bp = &insn->state.regs[CFI_BP];
+ orc->end = insn->state.end;
+
if (cfa->base == CFI_UNDEFINED) {
orc->sp_reg = ORC_REG_UNDEFINED;
continue;
diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index 11300db..236b9b97 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -18,6 +18,10 @@
OPTIONS
-------
+-d::
+--desc::
+Print extra event descriptions. (default)
+
--no-desc::
Don't print descriptions.
@@ -25,11 +29,13 @@
--long-desc::
Print longer event descriptions.
+--debug::
+Enable debugging output.
+
--details::
Print how named events are resolved internally into perf events, and also
any extra expressions computed by perf stat.
-
[[EVENT_MODIFIERS]]
EVENT MODIFIERS
---------------
@@ -234,7 +240,7 @@
perf record -e '{cycles,instructions}:S' ...
perf report --group
-Normally all events in a event group sample, but with :S only
+Normally all events in an event group sample, but with :S only
the first event (the leader) samples, and it only reads the values of the
other events in the group.
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 04168da..246dee0 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -94,7 +94,7 @@
"perf report" to view group events together.
--filter=<filter>::
- Event filter. This option should follow a event selector (-e) which
+ Event filter. This option should follow an event selector (-e) which
selects either tracepoint event(s) or a hardware trace PMU
(e.g. Intel PT or CoreSight).
@@ -153,7 +153,7 @@
--exclude-perf::
Don't record events issued by perf itself. This option should follow
- a event selector (-e) which selects tracepoint event(s). It adds a
+ an event selector (-e) which selects tracepoint event(s). It adds a
filter expression 'common_pid != $PERFPID' to filters. If other
'--filter' exists, the new filter expression will be combined with
them by '&&'.
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index f5a3b40..f6d1a03 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -54,6 +54,8 @@
ifeq ($(SRCARCH),arm64)
NO_PERF_REGS := 0
+ NO_SYSCALL_TABLE := 0
+ CFLAGS += -I$(OUTPUT)arch/arm64/include/generated
LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
endif
@@ -905,8 +907,8 @@
mandir = share/man
infodir = share/info
perfexecdir = libexec/perf-core
-perf_include_dir = lib/include/perf
-perf_examples_dir = lib/examples/perf
+perf_include_dir = lib/perf/include
+perf_examples_dir = lib/perf/examples
sharedir = $(prefix)/share
template_dir = share/perf-core/templates
STRACE_GROUPS_DIR = share/perf-core/strace/groups
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index ecc9fc9..b3d1b12 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -384,6 +384,8 @@
SHELL = $(SHELL_PATH)
+linux_uapi_dir := $(srctree)/tools/include/uapi/linux
+
beauty_outdir := $(OUTPUT)trace/beauty/generated
beauty_ioctl_outdir := $(beauty_outdir)/ioctl
drm_ioctl_array := $(beauty_ioctl_outdir)/drm_ioctl_array.c
@@ -431,6 +433,12 @@
$(kvm_ioctl_array): $(kvm_hdr_dir)/kvm.h $(kvm_ioctl_tbl)
$(Q)$(SHELL) '$(kvm_ioctl_tbl)' $(kvm_hdr_dir) > $@
+socket_ipproto_array := $(beauty_outdir)/socket_ipproto_array.c
+socket_ipproto_tbl := $(srctree)/tools/perf/trace/beauty/socket_ipproto.sh
+
+$(socket_ipproto_array): $(linux_uapi_dir)/in.h $(socket_ipproto_tbl)
+ $(Q)$(SHELL) '$(socket_ipproto_tbl)' $(linux_uapi_dir) > $@
+
vhost_virtio_ioctl_array := $(beauty_ioctl_outdir)/vhost_virtio_ioctl_array.c
vhost_virtio_hdr_dir := $(srctree)/tools/include/uapi/linux
vhost_virtio_ioctl_tbl := $(srctree)/tools/perf/trace/beauty/vhost_virtio_ioctl.sh
@@ -566,6 +574,7 @@
$(sndrv_ctl_ioctl_array) \
$(kcmp_type_array) \
$(kvm_ioctl_array) \
+ $(socket_ipproto_array) \
$(vhost_virtio_ioctl_array) \
$(madvise_behavior_array) \
$(perf_ioctl_array) \
@@ -860,6 +869,7 @@
$(OUTPUT)$(sndrv_pcm_ioctl_array) \
$(OUTPUT)$(kvm_ioctl_array) \
$(OUTPUT)$(kcmp_type_array) \
+ $(OUTPUT)$(socket_ipproto_array) \
$(OUTPUT)$(vhost_virtio_ioctl_array) \
$(OUTPUT)$(perf_ioctl_array) \
$(OUTPUT)$(prctl_option_array) \
diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
index 91de486..f013b11 100644
--- a/tools/perf/arch/arm64/Makefile
+++ b/tools/perf/arch/arm64/Makefile
@@ -4,3 +4,24 @@
endif
PERF_HAVE_JITDUMP := 1
PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
+
+#
+# Syscall table generation for perf
+#
+
+out := $(OUTPUT)arch/arm64/include/generated/asm
+header := $(out)/syscalls.c
+sysdef := $(srctree)/tools/include/uapi/asm-generic/unistd.h
+sysprf := $(srctree)/tools/perf/arch/arm64/entry/syscalls/
+systbl := $(sysprf)/mksyscalltbl
+
+# Create output directory if not already present
+_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
+
+$(header): $(sysdef) $(systbl)
+ $(Q)$(SHELL) '$(systbl)' '$(CC)' '$(HOSTCC)' $(sysdef) > $@
+
+clean::
+ $(call QUIET_CLEAN, arm64) $(RM) $(header)
+
+archheaders: $(header)
diff --git a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
new file mode 100755
index 0000000..52e1973
--- /dev/null
+++ b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
@@ -0,0 +1,62 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Generate system call table for perf. Derived from
+# powerpc script.
+#
+# Copyright IBM Corp. 2017
+# Author(s): Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
+# Changed by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
+# Changed by: Kim Phillips <kim.phillips@arm.com>
+
+gcc=$1
+hostcc=$2
+input=$3
+
+if ! test -r $input; then
+ echo "Could not read input file" >&2
+ exit 1
+fi
+
+create_table_from_c()
+{
+ local sc nr last_sc
+
+ create_table_exe=`mktemp /tmp/create-table-XXXXXX`
+
+ {
+
+ cat <<-_EoHEADER
+ #include <stdio.h>
+ #define __ARCH_WANT_RENAMEAT
+ #include "$input"
+ int main(int argc, char *argv[])
+ {
+ _EoHEADER
+
+ while read sc nr; do
+ printf "%s\n" " printf(\"\\t[%d] = \\\"$sc\\\",\\n\", __NR_$sc);"
+ last_sc=$sc
+ done
+
+ printf "%s\n" " printf(\"#define SYSCALLTBL_ARM64_MAX_ID %d\\n\", __NR_$last_sc);"
+ printf "}\n"
+
+ } | $hostcc -o $create_table_exe -x c -
+
+ $create_table_exe
+
+ rm -f $create_table_exe
+}
+
+create_table()
+{
+ echo "static const char *syscalltbl_arm64[] = {"
+ create_table_from_c
+ echo "};"
+}
+
+$gcc -E -dM -x c $input \
+ |sed -ne 's/^#define __NR_//p' \
+ |sort -t' ' -k2 -nu \
+ |create_table
diff --git a/tools/perf/arch/powerpc/util/skip-callchain-idx.c b/tools/perf/arch/powerpc/util/skip-callchain-idx.c
index ef5d59a..7c6eeb4 100644
--- a/tools/perf/arch/powerpc/util/skip-callchain-idx.c
+++ b/tools/perf/arch/powerpc/util/skip-callchain-idx.c
@@ -58,9 +58,13 @@
}
/*
- * Check if return address is on the stack.
+ * Check if return address is on the stack. If return address
+ * is in a register (typically R0), it is yet to be saved on
+ * the stack.
*/
- if (nops != 0 || ops != NULL)
+ if ((nops != 0 || ops != NULL) &&
+ !(nops == 1 && ops[0].atom == DW_OP_regx &&
+ ops[0].number2 == 0 && ops[0].offset == 0))
return 0;
/*
@@ -246,7 +250,7 @@
if (!chain || chain->nr < 3)
return skip_slot;
- ip = chain->ips[2];
+ ip = chain->ips[1];
thread__find_symbol(thread, PERF_RECORD_MISC_USER, ip, &al);
diff --git a/tools/perf/arch/s390/util/kvm-stat.c b/tools/perf/arch/s390/util/kvm-stat.c
index d233e2eb..aaabab5 100644
--- a/tools/perf/arch/s390/util/kvm-stat.c
+++ b/tools/perf/arch/s390/util/kvm-stat.c
@@ -102,7 +102,7 @@
int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
{
- if (strstr(cpuid, "IBM/S390")) {
+ if (strstr(cpuid, "IBM")) {
kvm->exit_reasons = sie_exit_reasons;
kvm->exit_reasons_isa = "SIE";
} else
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 6a8738f..f3aa9d0 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2193,7 +2193,7 @@
fprintf(out, "%s\n", bf);
fprintf(out, " -------------------------------------------------------------\n");
- hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, true);
+ hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
}
static void print_pareto(FILE *out)
@@ -2268,7 +2268,7 @@
fprintf(out, "=================================================\n");
fprintf(out, "#\n");
- hists__fprintf(&c2c.hists.hists, true, 0, 0, 0, stdout, false);
+ hists__fprintf(&c2c.hists.hists, true, 0, 0, 0, stdout, true);
fprintf(out, "\n");
fprintf(out, "=================================================\n");
@@ -2349,6 +2349,9 @@
" s Toggle full length of symbol and source line columns \n"
" q Return back to cacheline list \n";
+ if (!he)
+ return 0;
+
/* Display compact version first. */
c2c.symbol_full = false;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index d660cb7..39db2ee 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -696,7 +696,7 @@
hists__output_resort(hists, NULL);
hists__fprintf(hists, !quiet, 0, 0, 0, stdout,
- symbol_conf.use_callchain);
+ !symbol_conf.use_callchain);
}
static void data__fprintf(void)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c04dc7b..02f7a3c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -478,8 +478,8 @@
hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
- symbol_conf.use_callchain ||
- symbol_conf.show_branchflag_count);
+ !(symbol_conf.use_callchain ||
+ symbol_conf.show_branchflag_count));
fprintf(stdout, "\n\n");
}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 05be023..d097b5b4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -296,18 +296,6 @@
return perf_evsel__open_per_thread(evsel, evsel_list->threads);
}
-/*
- * Does the counter have nsecs as a unit?
- */
-static inline int nsec_counter(struct perf_evsel *evsel)
-{
- if (perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
- perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
- return 1;
-
- return 0;
-}
-
static int process_synthesized_event(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct perf_sample *sample __maybe_unused,
@@ -1058,34 +1046,6 @@
fprintf(os->fh, "%*s ", metric_only_len, unit);
}
-static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
-{
- FILE *output = stat_config.output;
- double msecs = avg / NSEC_PER_MSEC;
- const char *fmt_v, *fmt_n;
- char name[25];
-
- fmt_v = csv_output ? "%.6f%s" : "%18.6f%s";
- fmt_n = csv_output ? "%s" : "%-25s";
-
- aggr_printout(evsel, id, nr);
-
- scnprintf(name, sizeof(name), "%s%s",
- perf_evsel__name(evsel), csv_output ? "" : " (msec)");
-
- fprintf(output, fmt_v, msecs, csv_sep);
-
- if (csv_output)
- fprintf(output, "%s%s", evsel->unit, csv_sep);
- else
- fprintf(output, "%-*s%s", unit_width, evsel->unit, csv_sep);
-
- fprintf(output, fmt_n, name);
-
- if (evsel->cgrp)
- fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
-}
-
static int first_shadow_cpu(struct perf_evsel *evsel, int id)
{
int i;
@@ -1241,11 +1201,7 @@
return;
}
- if (metric_only)
- /* nothing */;
- else if (nsec_counter(counter))
- nsec_printout(id, nr, counter, uval);
- else
+ if (!metric_only)
abs_printout(id, nr, counter, uval);
out.print_metric = pm;
@@ -1331,7 +1287,7 @@
alias->scale != counter->scale ||
alias->cgrp != counter->cgrp ||
strcmp(alias->unit, counter->unit) ||
- nsec_counter(alias) != nsec_counter(counter))
+ perf_evsel__is_clock(alias) != perf_evsel__is_clock(counter))
break;
alias->merged_stat = true;
cb(alias, data, false);
@@ -2449,6 +2405,18 @@
return 0;
if (transaction_run) {
+ /* Handle -T as -M transaction. Once platform specific metrics
+ * support has been added to the json files, all archictures
+ * will use this approach. To determine transaction support
+ * on an architecture test for such a metric name.
+ */
+ if (metricgroup__has_metric("transaction")) {
+ struct option opt = { .value = &evsel_list };
+
+ return metricgroup__parse_groups(&opt, "transaction",
+ &metric_events);
+ }
+
if (pmu_have_event("cpu", "cycles-ct") &&
pmu_have_event("cpu", "el-start"))
err = parse_events(evsel_list, transaction_attrs,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ffdc276..d21d875 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -307,7 +307,7 @@
hists__output_recalc_col_len(hists, top->print_entries - printed);
putchar('\n');
hists__fprintf(hists, false, top->print_entries - printed, win_width,
- top->min_percent, stdout, symbol_conf.use_callchain);
+ top->min_percent, stdout, !symbol_conf.use_callchain);
}
static void prompt_integer(int *target, const char *msg)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 6a748ec..88561ee 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -291,7 +291,7 @@
{
int idx = val - sa->offset;
- if (idx < 0 || idx >= sa->nr_entries)
+ if (idx < 0 || idx >= sa->nr_entries || sa->entries[idx] == NULL)
return scnprintf(bf, size, intfmt, val);
return scnprintf(bf, size, "%s", sa->entries[idx]);
@@ -761,10 +761,12 @@
.arg = { [0] = STRARRAY(resource, rlimit_resources), }, },
{ .name = "socket",
.arg = { [0] = STRARRAY(family, socket_families),
- [1] = { .scnprintf = SCA_SK_TYPE, /* type */ }, }, },
+ [1] = { .scnprintf = SCA_SK_TYPE, /* type */ },
+ [2] = { .scnprintf = SCA_SK_PROTO, /* protocol */ }, }, },
{ .name = "socketpair",
.arg = { [0] = STRARRAY(family, socket_families),
- [1] = { .scnprintf = SCA_SK_TYPE, /* type */ }, }, },
+ [1] = { .scnprintf = SCA_SK_TYPE, /* type */ },
+ [2] = { .scnprintf = SCA_SK_PROTO, /* protocol */ }, }, },
{ .name = "stat", .alias = "newstat", },
{ .name = "statx",
.arg = { [0] = { .scnprintf = SCA_FDAT, /* fdat */ },
@@ -2990,6 +2992,7 @@
if (trace__validate_ev_qualifier(trace))
goto out;
+ trace->trace_syscalls = true;
}
err = 0;
@@ -3045,7 +3048,7 @@
},
.output = stderr,
.show_comm = true,
- .trace_syscalls = true,
+ .trace_syscalls = false,
.kernel_syscallchains = false,
.max_stack = UINT_MAX,
};
@@ -3191,13 +3194,7 @@
if (!trace.trace_syscalls && !trace.trace_pgfaults &&
trace.evlist->nr_entries == 0 /* Was --events used? */) {
- pr_err("Please specify something to trace.\n");
- return -1;
- }
-
- if (!trace.trace_syscalls && trace.ev_qualifier) {
- pr_err("The -e option can't be used with --no-syscalls.\n");
- goto out;
+ trace.trace_syscalls = true;
}
if (output_name != NULL) {
diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh
index 10f333e..de28466 100755
--- a/tools/perf/check-headers.sh
+++ b/tools/perf/check-headers.sh
@@ -7,6 +7,7 @@
include/uapi/linux/fcntl.h
include/uapi/linux/kcmp.h
include/uapi/linux/kvm.h
+include/uapi/linux/in.h
include/uapi/linux/perf_event.h
include/uapi/linux/prctl.h
include/uapi/linux/sched.h
@@ -35,6 +36,7 @@
arch/s390/include/uapi/asm/sie.h
arch/arm/include/uapi/asm/kvm.h
arch/arm64/include/uapi/asm/kvm.h
+arch/arm64/include/uapi/asm/unistd.h
arch/alpha/include/uapi/asm/errno.h
arch/mips/include/asm/errno.h
arch/mips/include/uapi/asm/errno.h
@@ -53,6 +55,7 @@
include/uapi/asm-generic/errno-base.h
include/uapi/asm-generic/ioctls.h
include/uapi/asm-generic/mman-common.h
+include/uapi/asm-generic/unistd.h
'
check_2 () {
diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h
index dd764ad..a63aa62 100644
--- a/tools/perf/include/bpf/bpf.h
+++ b/tools/perf/include/bpf/bpf.h
@@ -1,6 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
#ifndef _PERF_BPF_H
#define _PERF_BPF_H
+
+#include <uapi/linux/bpf.h>
+
#define SEC(NAME) __attribute__((section(NAME), used))
#define probe(function, vars) \
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index d215714..21bf7f5 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -25,7 +25,9 @@
return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}
+#ifndef MAX_NR_CPUS
#define MAX_NR_CPUS 1024
+#endif
extern const char *input_name;
extern bool perf_host, perf_guest;
diff --git a/tools/perf/pmu-events/arch/arm64/cavium/thunderx2/core-imp-def.json b/tools/perf/pmu-events/arch/arm64/cavium/thunderx2/core-imp-def.json
index bc03c06..752e47e 100644
--- a/tools/perf/pmu-events/arch/arm64/cavium/thunderx2/core-imp-def.json
+++ b/tools/perf/pmu-events/arch/arm64/cavium/thunderx2/core-imp-def.json
@@ -12,6 +12,21 @@
"ArchStdEvent": "L1D_CACHE_REFILL_WR",
},
{
+ "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WB_VICTIM",
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WB_CLEAN",
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_INVAL",
+ },
+ {
"ArchStdEvent": "L1D_TLB_REFILL_RD",
},
{
@@ -24,9 +39,75 @@
"ArchStdEvent": "L1D_TLB_WR",
},
{
+ "ArchStdEvent": "L2D_TLB_REFILL_RD",
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_REFILL_WR",
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_RD",
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_WR",
+ },
+ {
"ArchStdEvent": "BUS_ACCESS_RD",
- },
- {
+ },
+ {
"ArchStdEvent": "BUS_ACCESS_WR",
- }
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_RD",
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_WR",
+ },
+ {
+ "ArchStdEvent": "UNALIGNED_LD_SPEC",
+ },
+ {
+ "ArchStdEvent": "UNALIGNED_ST_SPEC",
+ },
+ {
+ "ArchStdEvent": "UNALIGNED_LDST_SPEC",
+ },
+ {
+ "ArchStdEvent": "EXC_UNDEF",
+ },
+ {
+ "ArchStdEvent": "EXC_SVC",
+ },
+ {
+ "ArchStdEvent": "EXC_PABORT",
+ },
+ {
+ "ArchStdEvent": "EXC_DABORT",
+ },
+ {
+ "ArchStdEvent": "EXC_IRQ",
+ },
+ {
+ "ArchStdEvent": "EXC_FIQ",
+ },
+ {
+ "ArchStdEvent": "EXC_SMC",
+ },
+ {
+ "ArchStdEvent": "EXC_HVC",
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_PABORT",
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_DABORT",
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_OTHER",
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_IRQ",
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_FIQ",
+ }
]
diff --git a/tools/perf/pmu-events/arch/s390/cf_z10/basic.json b/tools/perf/pmu-events/arch/s390/cf_z10/basic.json
index 8bf1675..2dd8daf 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z10/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z10/basic.json
@@ -1,71 +1,83 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "0",
"EventName": "CPU_CYCLES",
"BriefDescription": "CPU Cycles",
"PublicDescription": "Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "1",
"EventName": "INSTRUCTIONS",
"BriefDescription": "Instructions",
"PublicDescription": "Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "2",
"EventName": "L1I_DIR_WRITES",
"BriefDescription": "L1I Directory Writes",
"PublicDescription": "Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "3",
"EventName": "L1I_PENALTY_CYCLES",
"BriefDescription": "L1I Penalty Cycles",
"PublicDescription": "Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "4",
"EventName": "L1D_DIR_WRITES",
"BriefDescription": "L1D Directory Writes",
"PublicDescription": "Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "5",
"EventName": "L1D_PENALTY_CYCLES",
"BriefDescription": "L1D Penalty Cycles",
"PublicDescription": "Level-1 D-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "32",
"EventName": "PROBLEM_STATE_CPU_CYCLES",
"BriefDescription": "Problem-State CPU Cycles",
"PublicDescription": "Problem-State Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "33",
"EventName": "PROBLEM_STATE_INSTRUCTIONS",
"BriefDescription": "Problem-State Instructions",
"PublicDescription": "Problem-State Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "34",
"EventName": "PROBLEM_STATE_L1I_DIR_WRITES",
"BriefDescription": "Problem-State L1I Directory Writes",
"PublicDescription": "Problem-State Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "35",
"EventName": "PROBLEM_STATE_L1I_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1I Penalty Cycles",
"PublicDescription": "Problem-State Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "36",
"EventName": "PROBLEM_STATE_L1D_DIR_WRITES",
"BriefDescription": "Problem-State L1D Directory Writes",
"PublicDescription": "Problem-State Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "37",
"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1D Penalty Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z10/crypto.json b/tools/perf/pmu-events/arch/s390/cf_z10/crypto.json
index 7e5b724..db286f1 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z10/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z10/crypto.json
@@ -1,95 +1,111 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "64",
"EventName": "PRNG_FUNCTIONS",
"BriefDescription": "PRNG Functions",
"PublicDescription": "Total number of the PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "65",
"EventName": "PRNG_CYCLES",
"BriefDescription": "PRNG Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "66",
"EventName": "PRNG_BLOCKED_FUNCTIONS",
"BriefDescription": "PRNG Blocked Functions",
"PublicDescription": "Total number of the PRNG functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "67",
"EventName": "PRNG_BLOCKED_CYCLES",
"BriefDescription": "PRNG Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the PRNG functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "68",
"EventName": "SHA_FUNCTIONS",
"BriefDescription": "SHA Functions",
"PublicDescription": "Total number of SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "69",
"EventName": "SHA_CYCLES",
"BriefDescription": "SHA Cycles",
"PublicDescription": "Total number of CPU cycles when the SHA coprocessor is busy performing the SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "70",
"EventName": "SHA_BLOCKED_FUNCTIONS",
"BriefDescription": "SHA Blocked Functions",
"PublicDescription": "Total number of the SHA functions that are issued by the CPU and are blocked because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "71",
"EventName": "SHA_BLOCKED_CYCLES",
"BriefDescription": "SHA Bloced Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the SHA functions issued by the CPU because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "72",
"EventName": "DEA_FUNCTIONS",
"BriefDescription": "DEA Functions",
"PublicDescription": "Total number of the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "73",
"EventName": "DEA_CYCLES",
"BriefDescription": "DEA Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "74",
"EventName": "DEA_BLOCKED_FUNCTIONS",
"BriefDescription": "DEA Blocked Functions",
"PublicDescription": "Total number of the DEA functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "75",
"EventName": "DEA_BLOCKED_CYCLES",
"BriefDescription": "DEA Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the DEA functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "76",
"EventName": "AES_FUNCTIONS",
"BriefDescription": "AES Functions",
"PublicDescription": "Total number of AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "77",
"EventName": "AES_CYCLES",
"BriefDescription": "AES Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "78",
"EventName": "AES_BLOCKED_FUNCTIONS",
"BriefDescription": "AES Blocked Functions",
"PublicDescription": "Total number of AES functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "79",
"EventName": "AES_BLOCKED_CYCLES",
"BriefDescription": "AES Blocked Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z10/extended.json b/tools/perf/pmu-events/arch/s390/cf_z10/extended.json
index 0feedb4..b6b7f29 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z10/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z10/extended.json
@@ -1,107 +1,125 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "128",
"EventName": "L1I_L2_SOURCED_WRITES",
"BriefDescription": "L1I L2 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the returned cache line was sourced from the Level-2 (L1.5) cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "129",
"EventName": "L1D_L2_SOURCED_WRITES",
"BriefDescription": "L1D L2 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the installed cache line was sourced from the Level-2 (L1.5) cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "130",
"EventName": "L1I_L3_LOCAL_WRITES",
"BriefDescription": "L1I L3 Local Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the installed cache line was sourced from the Level-3 cache that is on the same book as the Instruction cache (Local L2 cache)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "131",
"EventName": "L1D_L3_LOCAL_WRITES",
"BriefDescription": "L1D L3 Local Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the installtion cache line was source from the Level-3 cache that is on the same book as the Data cache (Local L2 cache)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "132",
"EventName": "L1I_L3_REMOTE_WRITES",
"BriefDescription": "L1I L3 Remote Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the installed cache line was sourced from a Level-3 cache that is not on the same book as the Instruction cache (Remote L2 cache)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "133",
"EventName": "L1D_L3_REMOTE_WRITES",
"BriefDescription": "L1D L3 Remote Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the installed cache line was sourced from a Level-3 cache that is not on the same book as the Data cache (Remote L2 cache)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "134",
"EventName": "L1D_LMEM_SOURCED_WRITES",
"BriefDescription": "L1D Local Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the installed cache line was sourced from memory that is attached to the same book as the Data cache (Local Memory)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "135",
"EventName": "L1I_LMEM_SOURCED_WRITES",
"BriefDescription": "L1I Local Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache where the installed cache line was sourced from memory that is attached to the s ame book as the Instruction cache (Local Memory)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "136",
"EventName": "L1D_RO_EXCL_WRITES",
"BriefDescription": "L1D Read-only Exclusive Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache where the line was originally in a Read-Only state in the cache but has been updated to be in the Exclusive state that allows stores to the cache line"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "137",
"EventName": "L1I_CACHELINE_INVALIDATES",
"BriefDescription": "L1I Cacheline Invalidates",
"PublicDescription": "A cache line in the Level-1 I-Cache has been invalidated by a store on the same CPU as the Level-1 I-Cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "138",
"EventName": "ITLB1_WRITES",
"BriefDescription": "ITLB1 Writes",
"PublicDescription": "A translation entry has been written into the Level-1 Instruction Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "139",
"EventName": "DTLB1_WRITES",
"BriefDescription": "DTLB1 Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Data Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "140",
"EventName": "TLB2_PTE_WRITES",
"BriefDescription": "TLB2 PTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Page Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "141",
"EventName": "TLB2_CRSTE_WRITES",
"BriefDescription": "TLB2 CRSTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Common Region Segment Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "142",
"EventName": "TLB2_CRSTE_HPAGE_WRITES",
"BriefDescription": "TLB2 CRSTE One-Megabyte Page Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Common Region Segment Table Entry arrays for a one-megabyte large page translation"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "145",
"EventName": "ITLB1_MISSES",
"BriefDescription": "ITLB1 Misses",
"PublicDescription": "Level-1 Instruction TLB miss in progress. Incremented by one for every cycle an ITLB1 miss is in progress"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "146",
"EventName": "DTLB1_MISSES",
"BriefDescription": "DTLB1 Misses",
"PublicDescription": "Level-1 Data TLB miss in progress. Incremented by one for every cycle an DTLB1 miss is in progress"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "147",
"EventName": "L2C_STORES_SENT",
"BriefDescription": "L2C Stores Sent",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z13/basic.json b/tools/perf/pmu-events/arch/s390/cf_z13/basic.json
index 8bf1675..2dd8daf 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z13/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z13/basic.json
@@ -1,71 +1,83 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "0",
"EventName": "CPU_CYCLES",
"BriefDescription": "CPU Cycles",
"PublicDescription": "Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "1",
"EventName": "INSTRUCTIONS",
"BriefDescription": "Instructions",
"PublicDescription": "Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "2",
"EventName": "L1I_DIR_WRITES",
"BriefDescription": "L1I Directory Writes",
"PublicDescription": "Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "3",
"EventName": "L1I_PENALTY_CYCLES",
"BriefDescription": "L1I Penalty Cycles",
"PublicDescription": "Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "4",
"EventName": "L1D_DIR_WRITES",
"BriefDescription": "L1D Directory Writes",
"PublicDescription": "Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "5",
"EventName": "L1D_PENALTY_CYCLES",
"BriefDescription": "L1D Penalty Cycles",
"PublicDescription": "Level-1 D-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "32",
"EventName": "PROBLEM_STATE_CPU_CYCLES",
"BriefDescription": "Problem-State CPU Cycles",
"PublicDescription": "Problem-State Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "33",
"EventName": "PROBLEM_STATE_INSTRUCTIONS",
"BriefDescription": "Problem-State Instructions",
"PublicDescription": "Problem-State Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "34",
"EventName": "PROBLEM_STATE_L1I_DIR_WRITES",
"BriefDescription": "Problem-State L1I Directory Writes",
"PublicDescription": "Problem-State Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "35",
"EventName": "PROBLEM_STATE_L1I_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1I Penalty Cycles",
"PublicDescription": "Problem-State Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "36",
"EventName": "PROBLEM_STATE_L1D_DIR_WRITES",
"BriefDescription": "Problem-State L1D Directory Writes",
"PublicDescription": "Problem-State Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "37",
"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1D Penalty Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z13/crypto.json b/tools/perf/pmu-events/arch/s390/cf_z13/crypto.json
index 7e5b724..db286f1 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z13/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z13/crypto.json
@@ -1,95 +1,111 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "64",
"EventName": "PRNG_FUNCTIONS",
"BriefDescription": "PRNG Functions",
"PublicDescription": "Total number of the PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "65",
"EventName": "PRNG_CYCLES",
"BriefDescription": "PRNG Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "66",
"EventName": "PRNG_BLOCKED_FUNCTIONS",
"BriefDescription": "PRNG Blocked Functions",
"PublicDescription": "Total number of the PRNG functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "67",
"EventName": "PRNG_BLOCKED_CYCLES",
"BriefDescription": "PRNG Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the PRNG functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "68",
"EventName": "SHA_FUNCTIONS",
"BriefDescription": "SHA Functions",
"PublicDescription": "Total number of SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "69",
"EventName": "SHA_CYCLES",
"BriefDescription": "SHA Cycles",
"PublicDescription": "Total number of CPU cycles when the SHA coprocessor is busy performing the SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "70",
"EventName": "SHA_BLOCKED_FUNCTIONS",
"BriefDescription": "SHA Blocked Functions",
"PublicDescription": "Total number of the SHA functions that are issued by the CPU and are blocked because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "71",
"EventName": "SHA_BLOCKED_CYCLES",
"BriefDescription": "SHA Bloced Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the SHA functions issued by the CPU because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "72",
"EventName": "DEA_FUNCTIONS",
"BriefDescription": "DEA Functions",
"PublicDescription": "Total number of the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "73",
"EventName": "DEA_CYCLES",
"BriefDescription": "DEA Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "74",
"EventName": "DEA_BLOCKED_FUNCTIONS",
"BriefDescription": "DEA Blocked Functions",
"PublicDescription": "Total number of the DEA functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "75",
"EventName": "DEA_BLOCKED_CYCLES",
"BriefDescription": "DEA Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the DEA functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "76",
"EventName": "AES_FUNCTIONS",
"BriefDescription": "AES Functions",
"PublicDescription": "Total number of AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "77",
"EventName": "AES_CYCLES",
"BriefDescription": "AES Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "78",
"EventName": "AES_BLOCKED_FUNCTIONS",
"BriefDescription": "AES Blocked Functions",
"PublicDescription": "Total number of AES functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "79",
"EventName": "AES_BLOCKED_CYCLES",
"BriefDescription": "AES Blocked Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z13/extended.json b/tools/perf/pmu-events/arch/s390/cf_z13/extended.json
index 9a002b6..436ce33 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z13/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z13/extended.json
@@ -1,335 +1,391 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "128",
"EventName": "L1D_RO_EXCL_WRITES",
"BriefDescription": "L1D Read-only Exclusive Writes",
"PublicDescription": "A directory write to the Level-1 Data cache where the line was originally in a Read-Only state in the cache but has been updated to be in the Exclusive state that allows stores to the cache line."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "129",
"EventName": "DTLB1_WRITES",
"BriefDescription": "DTLB1 Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Data Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "130",
"EventName": "DTLB1_MISSES",
"BriefDescription": "DTLB1 Misses",
"PublicDescription": "Level-1 Data TLB miss in progress. Incremented by one for every cycle a DTLB1 miss is in progress."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "131",
"EventName": "DTLB1_HPAGE_WRITES",
"BriefDescription": "DTLB1 One-Megabyte Page Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Data Translation Lookaside Buffer for a one-megabyte page"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "132",
"EventName": "DTLB1_GPAGE_WRITES",
"BriefDescription": "DTLB1 Two-Gigabyte Page Writes",
"PublicDescription": "Counter:132 Name:DTLB1_GPAGE_WRITES A translation entry has been written to the Level-1 Data Translation Lookaside Buffer for a two-gigabyte page."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "133",
"EventName": "L1D_L2D_SOURCED_WRITES",
"BriefDescription": "L1D L2D Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from the Level-2 Data cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "134",
"EventName": "ITLB1_WRITES",
"BriefDescription": "ITLB1 Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Instruction Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "135",
"EventName": "ITLB1_MISSES",
"BriefDescription": "ITLB1 Misses",
"PublicDescription": "Level-1 Instruction TLB miss in progress. Incremented by one for every cycle an ITLB1 miss is in progress"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "136",
"EventName": "L1I_L2I_SOURCED_WRITES",
"BriefDescription": "L1I L2I Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from the Level-2 Instruction cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "137",
"EventName": "TLB2_PTE_WRITES",
"BriefDescription": "TLB2 PTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Page Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "138",
"EventName": "TLB2_CRSTE_HPAGE_WRITES",
"BriefDescription": "TLB2 CRSTE One-Megabyte Page Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Combined Region Segment Table Entry arrays for a one-megabyte large page translation"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "139",
"EventName": "TLB2_CRSTE_WRITES",
"BriefDescription": "TLB2 CRSTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Combined Region Segment Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "140",
"EventName": "TX_C_TEND",
"BriefDescription": "Completed TEND instructions in constrained TX mode",
"PublicDescription": "A TEND instruction has completed in a constrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "141",
"EventName": "TX_NC_TEND",
"BriefDescription": "Completed TEND instructions in non-constrained TX mode",
"PublicDescription": "A TEND instruction has completed in a non-constrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "143",
"EventName": "L1C_TLB1_MISSES",
"BriefDescription": "L1C TLB1 Misses",
"PublicDescription": "Increments by one for any cycle where a Level-1 cache or Level-1 TLB miss is in progress."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "144",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1D On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Chip Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "145",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D On-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Chip Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "146",
"EventName": "L1D_ONNODE_L4_SOURCED_WRITES",
"BriefDescription": "L1D On-Node L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Node Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "147",
"EventName": "L1D_ONNODE_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D On-Node L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Node Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "148",
"EventName": "L1D_ONNODE_L3_SOURCED_WRITES",
"BriefDescription": "L1D On-Node L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Node Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "149",
"EventName": "L1D_ONDRAWER_L4_SOURCED_WRITES",
"BriefDescription": "L1D On-Drawer L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Drawer Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "150",
"EventName": "L1D_ONDRAWER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D On-Drawer L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Drawer Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "151",
"EventName": "L1D_ONDRAWER_L3_SOURCED_WRITES",
"BriefDescription": "L1D On-Drawer L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Drawer Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "152",
"EventName": "L1D_OFFDRAWER_SCOL_L4_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer Same-Column L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Same-Column Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "153",
"EventName": "L1D_OFFDRAWER_SCOL_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D Off-Drawer Same-Column L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Same-Column Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "154",
"EventName": "L1D_OFFDRAWER_SCOL_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer Same-Column L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Same-Column Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "155",
"EventName": "L1D_OFFDRAWER_FCOL_L4_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer Far-Column L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Far-Column Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "156",
"EventName": "L1D_OFFDRAWER_FCOL_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D Off-Drawer Far-Column L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Far-Column Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "157",
"EventName": "L1D_OFFDRAWER_FCOL_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer Far-Column L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Far-Column Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "158",
"EventName": "L1D_ONNODE_MEM_SOURCED_WRITES",
"BriefDescription": "L1D On-Node Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Node memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "159",
"EventName": "L1D_ONDRAWER_MEM_SOURCED_WRITES",
"BriefDescription": "L1D On-Drawer Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Drawer memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "160",
"EventName": "L1D_OFFDRAWER_MEM_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Drawer memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "161",
"EventName": "L1D_ONCHIP_MEM_SOURCED_WRITES",
"BriefDescription": "L1D On-Chip Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Chip memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "162",
"EventName": "L1I_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1I On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Chip Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "163",
"EventName": "L1I_ONCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I On-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On Chip Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "164",
"EventName": "L1I_ONNODE_L4_SOURCED_WRITES",
"BriefDescription": "L1I On-Chip L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Node Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "165",
"EventName": "L1I_ONNODE_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I On-Node L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Node Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "166",
"EventName": "L1I_ONNODE_L3_SOURCED_WRITES",
"BriefDescription": "L1I On-Node L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Node Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "167",
"EventName": "L1I_ONDRAWER_L4_SOURCED_WRITES",
"BriefDescription": "L1I On-Drawer L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Drawer Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "168",
"EventName": "L1I_ONDRAWER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I On-Drawer L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Drawer Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "169",
"EventName": "L1I_ONDRAWER_L3_SOURCED_WRITES",
"BriefDescription": "L1I On-Drawer L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Drawer Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "170",
"EventName": "L1I_OFFDRAWER_SCOL_L4_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer Same-Column L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Same-Column Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "171",
"EventName": "L1I_OFFDRAWER_SCOL_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I Off-Drawer Same-Column L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Same-Column Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "172",
"EventName": "L1I_OFFDRAWER_SCOL_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer Same-Column L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Same-Column Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "173",
"EventName": "L1I_OFFDRAWER_FCOL_L4_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer Far-Column L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Far-Column Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "174",
"EventName": "L1I_OFFDRAWER_FCOL_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I Off-Drawer Far-Column L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Far-Column Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "175",
"EventName": "L1I_OFFDRAWER_FCOL_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer Far-Column L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Far-Column Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "176",
"EventName": "L1I_ONNODE_MEM_SOURCED_WRITES",
"BriefDescription": "L1I On-Node Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from On-Node memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "177",
"EventName": "L1I_ONDRAWER_MEM_SOURCED_WRITES",
"BriefDescription": "L1I On-Drawer Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from On-Drawer memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "178",
"EventName": "L1I_OFFDRAWER_MEM_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from On-Drawer memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "179",
"EventName": "L1I_ONCHIP_MEM_SOURCED_WRITES",
"BriefDescription": "L1I On-Chip Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from On-Chip memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "218",
"EventName": "TX_NC_TABORT",
"BriefDescription": "Aborted transactions in non-constrained TX mode",
"PublicDescription": "A transaction abort has occurred in a non-constrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "219",
"EventName": "TX_C_TABORT_NO_SPECIAL",
"BriefDescription": "Aborted transactions in constrained TX mode not using special completion logic",
"PublicDescription": "A transaction abort has occurred in a constrained transactional-execution mode and the CPU is not using any special logic to allow the transaction to complete"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "220",
"EventName": "TX_C_TABORT_SPECIAL",
"BriefDescription": "Aborted transactions in constrained TX mode using special completion logic",
"PublicDescription": "A transaction abort has occurred in a constrained transactional-execution mode and the CPU is using special logic to allow the transaction to complete"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "448",
"EventName": "MT_DIAG_CYCLES_ONE_THR_ACTIVE",
"BriefDescription": "Cycle count with one thread active",
"PublicDescription": "Cycle count with one thread active"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "449",
"EventName": "MT_DIAG_CYCLES_TWO_THR_ACTIVE",
"BriefDescription": "Cycle count with two threads active",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z13/transaction.json b/tools/perf/pmu-events/arch/s390/cf_z13/transaction.json
new file mode 100644
index 0000000..1a0034f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/s390/cf_z13/transaction.json
@@ -0,0 +1,7 @@
+[
+ {
+ "BriefDescription": "Transaction count",
+ "MetricName": "transaction",
+ "MetricExpr": "TX_C_TEND + TX_NC_TEND + TX_NC_TABORT + TX_C_TABORT_SPECIAL + TX_C_TABORT_NO_SPECIAL"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/s390/cf_z14/basic.json b/tools/perf/pmu-events/arch/s390/cf_z14/basic.json
index 8f653c9..17fb524 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z14/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z14/basic.json
@@ -1,47 +1,55 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "0",
"EventName": "CPU_CYCLES",
"BriefDescription": "CPU Cycles",
"PublicDescription": "Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "1",
"EventName": "INSTRUCTIONS",
"BriefDescription": "Instructions",
"PublicDescription": "Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "2",
"EventName": "L1I_DIR_WRITES",
"BriefDescription": "L1I Directory Writes",
"PublicDescription": "Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "3",
"EventName": "L1I_PENALTY_CYCLES",
"BriefDescription": "L1I Penalty Cycles",
"PublicDescription": "Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "4",
"EventName": "L1D_DIR_WRITES",
"BriefDescription": "L1D Directory Writes",
"PublicDescription": "Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "5",
"EventName": "L1D_PENALTY_CYCLES",
"BriefDescription": "L1D Penalty Cycles",
"PublicDescription": "Level-1 D-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "32",
"EventName": "PROBLEM_STATE_CPU_CYCLES",
"BriefDescription": "Problem-State CPU Cycles",
"PublicDescription": "Problem-State Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "33",
"EventName": "PROBLEM_STATE_INSTRUCTIONS",
"BriefDescription": "Problem-State Instructions",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z14/crypto.json b/tools/perf/pmu-events/arch/s390/cf_z14/crypto.json
index 7e5b724..db286f1 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z14/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z14/crypto.json
@@ -1,95 +1,111 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "64",
"EventName": "PRNG_FUNCTIONS",
"BriefDescription": "PRNG Functions",
"PublicDescription": "Total number of the PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "65",
"EventName": "PRNG_CYCLES",
"BriefDescription": "PRNG Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "66",
"EventName": "PRNG_BLOCKED_FUNCTIONS",
"BriefDescription": "PRNG Blocked Functions",
"PublicDescription": "Total number of the PRNG functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "67",
"EventName": "PRNG_BLOCKED_CYCLES",
"BriefDescription": "PRNG Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the PRNG functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "68",
"EventName": "SHA_FUNCTIONS",
"BriefDescription": "SHA Functions",
"PublicDescription": "Total number of SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "69",
"EventName": "SHA_CYCLES",
"BriefDescription": "SHA Cycles",
"PublicDescription": "Total number of CPU cycles when the SHA coprocessor is busy performing the SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "70",
"EventName": "SHA_BLOCKED_FUNCTIONS",
"BriefDescription": "SHA Blocked Functions",
"PublicDescription": "Total number of the SHA functions that are issued by the CPU and are blocked because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "71",
"EventName": "SHA_BLOCKED_CYCLES",
"BriefDescription": "SHA Bloced Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the SHA functions issued by the CPU because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "72",
"EventName": "DEA_FUNCTIONS",
"BriefDescription": "DEA Functions",
"PublicDescription": "Total number of the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "73",
"EventName": "DEA_CYCLES",
"BriefDescription": "DEA Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "74",
"EventName": "DEA_BLOCKED_FUNCTIONS",
"BriefDescription": "DEA Blocked Functions",
"PublicDescription": "Total number of the DEA functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "75",
"EventName": "DEA_BLOCKED_CYCLES",
"BriefDescription": "DEA Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the DEA functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "76",
"EventName": "AES_FUNCTIONS",
"BriefDescription": "AES Functions",
"PublicDescription": "Total number of AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "77",
"EventName": "AES_CYCLES",
"BriefDescription": "AES Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "78",
"EventName": "AES_BLOCKED_FUNCTIONS",
"BriefDescription": "AES Blocked Functions",
"PublicDescription": "Total number of AES functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "79",
"EventName": "AES_BLOCKED_CYCLES",
"BriefDescription": "AES Blocked Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z14/extended.json b/tools/perf/pmu-events/arch/s390/cf_z14/extended.json
index aa4dfb4..e7a3524 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z14/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z14/extended.json
@@ -1,317 +1,370 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "128",
"EventName": "L1D_RO_EXCL_WRITES",
"BriefDescription": "L1D Read-only Exclusive Writes",
"PublicDescription": "Counter:128 Name:L1D_RO_EXCL_WRITES A directory write to the Level-1 Data cache where the line was originally in a Read-Only state in the cache but has been updated to be in the Exclusive state that allows stores to the cache line"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "129",
"EventName": "DTLB2_WRITES",
"BriefDescription": "DTLB2 Writes",
"PublicDescription": "A translation has been written into The Translation Lookaside Buffer 2 (TLB2) and the request was made by the data cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "130",
"EventName": "DTLB2_MISSES",
"BriefDescription": "DTLB2 Misses",
"PublicDescription": "A TLB2 miss is in progress for a request made by the data cache. Incremented by one for every TLB2 miss in progress for the Level-1 Data cache on this cycle"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "131",
"EventName": "DTLB2_HPAGE_WRITES",
"BriefDescription": "DTLB2 One-Megabyte Page Writes",
"PublicDescription": "A translation entry was written into the Combined Region and Segment Table Entry array in the Level-2 TLB for a one-megabyte page or a Last Host Translation was done"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "132",
"EventName": "DTLB2_GPAGE_WRITES",
"BriefDescription": "DTLB2 Two-Gigabyte Page Writes",
"PublicDescription": "A translation entry for a two-gigabyte page was written into the Level-2 TLB"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "133",
"EventName": "L1D_L2D_SOURCED_WRITES",
"BriefDescription": "L1D L2D Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from the Level-2 Data cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "134",
"EventName": "ITLB2_WRITES",
"BriefDescription": "ITLB2 Writes",
"PublicDescription": "A translation entry has been written into the Translation Lookaside Buffer 2 (TLB2) and the request was made by the instruction cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "135",
"EventName": "ITLB2_MISSES",
"BriefDescription": "ITLB2 Misses",
"PublicDescription": "A TLB2 miss is in progress for a request made by the instruction cache. Incremented by one for every TLB2 miss in progress for the Level-1 Instruction cache in a cycle"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "136",
"EventName": "L1I_L2I_SOURCED_WRITES",
"BriefDescription": "L1I L2I Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from the Level-2 Instruction cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "137",
"EventName": "TLB2_PTE_WRITES",
"BriefDescription": "TLB2 PTE Writes",
"PublicDescription": "A translation entry was written into the Page Table Entry array in the Level-2 TLB"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "138",
"EventName": "TLB2_CRSTE_WRITES",
"BriefDescription": "TLB2 CRSTE Writes",
"PublicDescription": "Translation entries were written into the Combined Region and Segment Table Entry array and the Page Table Entry array in the Level-2 TLB"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "139",
"EventName": "TLB2_ENGINES_BUSY",
"BriefDescription": "TLB2 Engines Busy",
"PublicDescription": "The number of Level-2 TLB translation engines busy in a cycle"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "140",
"EventName": "TX_C_TEND",
"BriefDescription": "Completed TEND instructions in constrained TX mode",
"PublicDescription": "A TEND instruction has completed in a constrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "141",
"EventName": "TX_NC_TEND",
"BriefDescription": "Completed TEND instructions in non-constrained TX mode",
"PublicDescription": "A TEND instruction has completed in a non-constrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "143",
"EventName": "L1C_TLB2_MISSES",
"BriefDescription": "L1C TLB2 Misses",
"PublicDescription": "Increments by one for any cycle where a level-1 cache or level-2 TLB miss is in progress"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "144",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1D On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Chip Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "145",
"EventName": "L1D_ONCHIP_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1D On-Chip Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Chip memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "146",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D On-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Chip Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "147",
"EventName": "L1D_ONCLUSTER_L3_SOURCED_WRITES",
"BriefDescription": "L1D On-Cluster L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Cluster Level-3 cache withountervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "148",
"EventName": "L1D_ONCLUSTER_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1D On-Cluster Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Cluster memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "149",
"EventName": "L1D_ONCLUSTER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D On-Cluster L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On-Cluster Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "150",
"EventName": "L1D_OFFCLUSTER_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Cluster L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Cluster Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "151",
"EventName": "L1D_OFFCLUSTER_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1D Off-Cluster Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from Off-Cluster memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "152",
"EventName": "L1D_OFFCLUSTER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D Off-Cluster L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Cluster Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "153",
"EventName": "L1D_OFFDRAWER_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "154",
"EventName": "L1D_OFFDRAWER_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from Off-Drawer memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "155",
"EventName": "L1D_OFFDRAWER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D Off-Drawer L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off-Drawer Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "156",
"EventName": "L1D_ONDRAWER_L4_SOURCED_WRITES",
"BriefDescription": "L1D On-Drawer L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Drawer Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "157",
"EventName": "L1D_OFFDRAWER_L4_SOURCED_WRITES",
"BriefDescription": "L1D Off-Drawer L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from Off-Drawer Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "158",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES_RO",
"BriefDescription": "L1D On-Chip L3 Sourced Writes read-only",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from On-Chip L3 but a read-only invalidate was done to remove other copies of the cache line"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "162",
"EventName": "L1I_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1I On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache ine was sourced from an On-Chip Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "163",
"EventName": "L1I_ONCHIP_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1I On-Chip Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache ine was sourced from On-Chip memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "164",
"EventName": "L1I_ONCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I On-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache ine was sourced from an On-Chip Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "165",
"EventName": "L1I_ONCLUSTER_L3_SOURCED_WRITES",
"BriefDescription": "L1I On-Cluster L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Cluster Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "166",
"EventName": "L1I_ONCLUSTER_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1I On-Cluster Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On-Cluster memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "167",
"EventName": "L1I_ONCLUSTER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I On-Cluster L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from On-Cluster Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "168",
"EventName": "L1I_OFFCLUSTER_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Cluster L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Cluster Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "169",
"EventName": "L1I_OFFCLUSTER_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1I Off-Cluster Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from Off-Cluster memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "170",
"EventName": "L1I_OFFCLUSTER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I Off-Cluster L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Cluster Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "171",
"EventName": "L1I_OFFDRAWER_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "172",
"EventName": "L1I_OFFDRAWER_MEMORY_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from Off-Drawer memory"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "173",
"EventName": "L1I_OFFDRAWER_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I Off-Drawer L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off-Drawer Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "174",
"EventName": "L1I_ONDRAWER_L4_SOURCED_WRITES",
"BriefDescription": "L1I On-Drawer L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from On-Drawer Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "175",
"EventName": "L1I_OFFDRAWER_L4_SOURCED_WRITES",
"BriefDescription": "L1I Off-Drawer L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from Off-Drawer Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "224",
"EventName": "BCD_DFP_EXECUTION_SLOTS",
"BriefDescription": "BCD DFP Execution Slots",
"PublicDescription": "Count of floating point execution slots used for finished Binary Coded Decimal to Decimal Floating Point conversions. Instructions: CDZT, CXZT, CZDT, CZXT"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "225",
"EventName": "VX_BCD_EXECUTION_SLOTS",
"BriefDescription": "VX BCD Execution Slots",
"PublicDescription": "Count of floating point execution slots used for finished vector arithmetic Binary Coded Decimal instructions. Instructions: VAP, VSP, VMPVMSP, VDP, VSDP, VRP, VLIP, VSRP, VPSOPVCP, VTP, VPKZ, VUPKZ, VCVB, VCVBG, VCVDVCVDG"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "226",
"EventName": "DECIMAL_INSTRUCTIONS",
"BriefDescription": "Decimal Instructions",
"PublicDescription": "Decimal instructions dispatched. Instructions: CVB, CVD, AP, CP, DP, ED, EDMK, MP, SRP, SP, ZAP"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "232",
"EventName": "LAST_HOST_TRANSLATIONS",
"BriefDescription": "Last host translation done",
"PublicDescription": "Last Host Translation done"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "243",
"EventName": "TX_NC_TABORT",
"BriefDescription": "Aborted transactions in non-constrained TX mode",
"PublicDescription": "A transaction abort has occurred in a non-constrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "244",
"EventName": "TX_C_TABORT_NO_SPECIAL",
"BriefDescription": "Aborted transactions in constrained TX mode not using special completion logic",
"PublicDescription": "A transaction abort has occurred in a constrained transactional-execution mode and the CPU is not using any special logic to allow the transaction to complete"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "245",
"EventName": "TX_C_TABORT_SPECIAL",
"BriefDescription": "Aborted transactions in constrained TX mode using special completion logic",
"PublicDescription": "A transaction abort has occurred in a constrained transactional-execution mode and the CPU is using special logic to allow the transaction to complete"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "448",
"EventName": "MT_DIAG_CYCLES_ONE_THR_ACTIVE",
"BriefDescription": "Cycle count with one thread active",
"PublicDescription": "Cycle count with one thread active"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "449",
"EventName": "MT_DIAG_CYCLES_TWO_THR_ACTIVE",
"BriefDescription": "Cycle count with two threads active",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z14/transaction.json b/tools/perf/pmu-events/arch/s390/cf_z14/transaction.json
new file mode 100644
index 0000000..1a0034f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/s390/cf_z14/transaction.json
@@ -0,0 +1,7 @@
+[
+ {
+ "BriefDescription": "Transaction count",
+ "MetricName": "transaction",
+ "MetricExpr": "TX_C_TEND + TX_NC_TEND + TX_NC_TABORT + TX_C_TABORT_SPECIAL + TX_C_TABORT_NO_SPECIAL"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/s390/cf_z196/basic.json b/tools/perf/pmu-events/arch/s390/cf_z196/basic.json
index 8bf1675..2dd8daf 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z196/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z196/basic.json
@@ -1,71 +1,83 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "0",
"EventName": "CPU_CYCLES",
"BriefDescription": "CPU Cycles",
"PublicDescription": "Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "1",
"EventName": "INSTRUCTIONS",
"BriefDescription": "Instructions",
"PublicDescription": "Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "2",
"EventName": "L1I_DIR_WRITES",
"BriefDescription": "L1I Directory Writes",
"PublicDescription": "Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "3",
"EventName": "L1I_PENALTY_CYCLES",
"BriefDescription": "L1I Penalty Cycles",
"PublicDescription": "Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "4",
"EventName": "L1D_DIR_WRITES",
"BriefDescription": "L1D Directory Writes",
"PublicDescription": "Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "5",
"EventName": "L1D_PENALTY_CYCLES",
"BriefDescription": "L1D Penalty Cycles",
"PublicDescription": "Level-1 D-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "32",
"EventName": "PROBLEM_STATE_CPU_CYCLES",
"BriefDescription": "Problem-State CPU Cycles",
"PublicDescription": "Problem-State Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "33",
"EventName": "PROBLEM_STATE_INSTRUCTIONS",
"BriefDescription": "Problem-State Instructions",
"PublicDescription": "Problem-State Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "34",
"EventName": "PROBLEM_STATE_L1I_DIR_WRITES",
"BriefDescription": "Problem-State L1I Directory Writes",
"PublicDescription": "Problem-State Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "35",
"EventName": "PROBLEM_STATE_L1I_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1I Penalty Cycles",
"PublicDescription": "Problem-State Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "36",
"EventName": "PROBLEM_STATE_L1D_DIR_WRITES",
"BriefDescription": "Problem-State L1D Directory Writes",
"PublicDescription": "Problem-State Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "37",
"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1D Penalty Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z196/crypto.json b/tools/perf/pmu-events/arch/s390/cf_z196/crypto.json
index 7e5b724..db286f1 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z196/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z196/crypto.json
@@ -1,95 +1,111 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "64",
"EventName": "PRNG_FUNCTIONS",
"BriefDescription": "PRNG Functions",
"PublicDescription": "Total number of the PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "65",
"EventName": "PRNG_CYCLES",
"BriefDescription": "PRNG Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "66",
"EventName": "PRNG_BLOCKED_FUNCTIONS",
"BriefDescription": "PRNG Blocked Functions",
"PublicDescription": "Total number of the PRNG functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "67",
"EventName": "PRNG_BLOCKED_CYCLES",
"BriefDescription": "PRNG Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the PRNG functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "68",
"EventName": "SHA_FUNCTIONS",
"BriefDescription": "SHA Functions",
"PublicDescription": "Total number of SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "69",
"EventName": "SHA_CYCLES",
"BriefDescription": "SHA Cycles",
"PublicDescription": "Total number of CPU cycles when the SHA coprocessor is busy performing the SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "70",
"EventName": "SHA_BLOCKED_FUNCTIONS",
"BriefDescription": "SHA Blocked Functions",
"PublicDescription": "Total number of the SHA functions that are issued by the CPU and are blocked because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "71",
"EventName": "SHA_BLOCKED_CYCLES",
"BriefDescription": "SHA Bloced Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the SHA functions issued by the CPU because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "72",
"EventName": "DEA_FUNCTIONS",
"BriefDescription": "DEA Functions",
"PublicDescription": "Total number of the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "73",
"EventName": "DEA_CYCLES",
"BriefDescription": "DEA Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "74",
"EventName": "DEA_BLOCKED_FUNCTIONS",
"BriefDescription": "DEA Blocked Functions",
"PublicDescription": "Total number of the DEA functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "75",
"EventName": "DEA_BLOCKED_CYCLES",
"BriefDescription": "DEA Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the DEA functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "76",
"EventName": "AES_FUNCTIONS",
"BriefDescription": "AES Functions",
"PublicDescription": "Total number of AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "77",
"EventName": "AES_CYCLES",
"BriefDescription": "AES Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "78",
"EventName": "AES_BLOCKED_FUNCTIONS",
"BriefDescription": "AES Blocked Functions",
"PublicDescription": "Total number of AES functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "79",
"EventName": "AES_BLOCKED_CYCLES",
"BriefDescription": "AES Blocked Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_z196/extended.json b/tools/perf/pmu-events/arch/s390/cf_z196/extended.json
index b6d7fec..b7b42a8 100644
--- a/tools/perf/pmu-events/arch/s390/cf_z196/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z196/extended.json
@@ -1,143 +1,167 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "128",
"EventName": "L1D_L2_SOURCED_WRITES",
"BriefDescription": "L1D L2 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the returned cache line was sourced from the Level-2 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "129",
"EventName": "L1I_L2_SOURCED_WRITES",
"BriefDescription": "L1I L2 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the returned cache line was sourced from the Level-2 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "130",
"EventName": "DTLB1_MISSES",
"BriefDescription": "DTLB1 Misses",
"PublicDescription": "Level-1 Data TLB miss in progress. Incremented by one for every cycle a DTLB1 miss is in progress."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "131",
"EventName": "ITLB1_MISSES",
"BriefDescription": "ITLB1 Misses",
"PublicDescription": "Level-1 Instruction TLB miss in progress. Incremented by one for every cycle a ITLB1 miss is in progress."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "133",
"EventName": "L2C_STORES_SENT",
"BriefDescription": "L2C Stores Sent",
"PublicDescription": "Incremented by one for every store sent to Level-2 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "134",
"EventName": "L1D_OFFBOOK_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Book L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the returned cache line was sourced from an Off Book Level-3 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "135",
"EventName": "L1D_ONBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1D On-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the returned cache line was sourced from an On Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "136",
"EventName": "L1I_ONBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1I On-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the returned cache line was sourced from an On Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "137",
"EventName": "L1D_RO_EXCL_WRITES",
"BriefDescription": "L1D Read-only Exclusive Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache where the line was originally in a Read-Only state in the cache but has been updated to be in the Exclusive state that allows stores to the cache line"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "138",
"EventName": "L1D_OFFBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1D Off-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the returned cache line was sourced from an Off Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "139",
"EventName": "L1I_OFFBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1I Off-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the returned cache line was sourced from an Off Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "140",
"EventName": "DTLB1_HPAGE_WRITES",
"BriefDescription": "DTLB1 One-Megabyte Page Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Data Translation Lookaside Buffer for a one-megabyte page"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "141",
"EventName": "L1D_LMEM_SOURCED_WRITES",
"BriefDescription": "L1D Local Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache where the installed cache line was sourced from memory that is attached to the same book as the Data cache (Local Memory)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "142",
"EventName": "L1I_LMEM_SOURCED_WRITES",
"BriefDescription": "L1I Local Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache where the installed cache line was sourced from memory that is attached to the same book as the Instruction cache (Local Memory)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "143",
"EventName": "L1I_OFFBOOK_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Book L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the returned cache line was sourced from an Off Book Level-3 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "144",
"EventName": "DTLB1_WRITES",
"BriefDescription": "DTLB1 Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Data Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "145",
"EventName": "ITLB1_WRITES",
"BriefDescription": "ITLB1 Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Instruction Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "146",
"EventName": "TLB2_PTE_WRITES",
"BriefDescription": "TLB2 PTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Page Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "147",
"EventName": "TLB2_CRSTE_HPAGE_WRITES",
"BriefDescription": "TLB2 CRSTE One-Megabyte Page Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Common Region Segment Table Entry arrays for a one-megabyte large page translation"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "148",
"EventName": "TLB2_CRSTE_WRITES",
"BriefDescription": "TLB2 CRSTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Common Region Segment Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "150",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1D On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the returned cache line was sourced from an On Chip Level-3 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "152",
"EventName": "L1D_OFFCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache directory where the returned cache line was sourced from an Off Chip/On Book Level-3 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "153",
"EventName": "L1I_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1I On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 I-Cache directory where the returned cache line was sourced from an On Chip Level-3 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "155",
"EventName": "L1I_OFFCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Chip L3 Sourced Writes",
diff --git a/tools/perf/pmu-events/arch/s390/cf_zec12/basic.json b/tools/perf/pmu-events/arch/s390/cf_zec12/basic.json
index 8bf1675..2dd8daf 100644
--- a/tools/perf/pmu-events/arch/s390/cf_zec12/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_zec12/basic.json
@@ -1,71 +1,83 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "0",
"EventName": "CPU_CYCLES",
"BriefDescription": "CPU Cycles",
"PublicDescription": "Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "1",
"EventName": "INSTRUCTIONS",
"BriefDescription": "Instructions",
"PublicDescription": "Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "2",
"EventName": "L1I_DIR_WRITES",
"BriefDescription": "L1I Directory Writes",
"PublicDescription": "Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "3",
"EventName": "L1I_PENALTY_CYCLES",
"BriefDescription": "L1I Penalty Cycles",
"PublicDescription": "Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "4",
"EventName": "L1D_DIR_WRITES",
"BriefDescription": "L1D Directory Writes",
"PublicDescription": "Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "5",
"EventName": "L1D_PENALTY_CYCLES",
"BriefDescription": "L1D Penalty Cycles",
"PublicDescription": "Level-1 D-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "32",
"EventName": "PROBLEM_STATE_CPU_CYCLES",
"BriefDescription": "Problem-State CPU Cycles",
"PublicDescription": "Problem-State Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "33",
"EventName": "PROBLEM_STATE_INSTRUCTIONS",
"BriefDescription": "Problem-State Instructions",
"PublicDescription": "Problem-State Instruction Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "34",
"EventName": "PROBLEM_STATE_L1I_DIR_WRITES",
"BriefDescription": "Problem-State L1I Directory Writes",
"PublicDescription": "Problem-State Level-1 I-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "35",
"EventName": "PROBLEM_STATE_L1I_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1I Penalty Cycles",
"PublicDescription": "Problem-State Level-1 I-Cache Penalty Cycle Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "36",
"EventName": "PROBLEM_STATE_L1D_DIR_WRITES",
"BriefDescription": "Problem-State L1D Directory Writes",
"PublicDescription": "Problem-State Level-1 D-Cache Directory Write Count"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "37",
"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
"BriefDescription": "Problem-State L1D Penalty Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_zec12/crypto.json b/tools/perf/pmu-events/arch/s390/cf_zec12/crypto.json
index 7e5b724..db286f1 100644
--- a/tools/perf/pmu-events/arch/s390/cf_zec12/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_zec12/crypto.json
@@ -1,95 +1,111 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "64",
"EventName": "PRNG_FUNCTIONS",
"BriefDescription": "PRNG Functions",
"PublicDescription": "Total number of the PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "65",
"EventName": "PRNG_CYCLES",
"BriefDescription": "PRNG Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing PRNG functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "66",
"EventName": "PRNG_BLOCKED_FUNCTIONS",
"BriefDescription": "PRNG Blocked Functions",
"PublicDescription": "Total number of the PRNG functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "67",
"EventName": "PRNG_BLOCKED_CYCLES",
"BriefDescription": "PRNG Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the PRNG functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "68",
"EventName": "SHA_FUNCTIONS",
"BriefDescription": "SHA Functions",
"PublicDescription": "Total number of SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "69",
"EventName": "SHA_CYCLES",
"BriefDescription": "SHA Cycles",
"PublicDescription": "Total number of CPU cycles when the SHA coprocessor is busy performing the SHA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "70",
"EventName": "SHA_BLOCKED_FUNCTIONS",
"BriefDescription": "SHA Blocked Functions",
"PublicDescription": "Total number of the SHA functions that are issued by the CPU and are blocked because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "71",
"EventName": "SHA_BLOCKED_CYCLES",
"BriefDescription": "SHA Bloced Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the SHA functions issued by the CPU because the SHA coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "72",
"EventName": "DEA_FUNCTIONS",
"BriefDescription": "DEA Functions",
"PublicDescription": "Total number of the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "73",
"EventName": "DEA_CYCLES",
"BriefDescription": "DEA Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the DEA functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "74",
"EventName": "DEA_BLOCKED_FUNCTIONS",
"BriefDescription": "DEA Blocked Functions",
"PublicDescription": "Total number of the DEA functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "75",
"EventName": "DEA_BLOCKED_CYCLES",
"BriefDescription": "DEA Blocked Cycles",
"PublicDescription": "Total number of CPU cycles blocked for the DEA functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "76",
"EventName": "AES_FUNCTIONS",
"BriefDescription": "AES Functions",
"PublicDescription": "Total number of AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "77",
"EventName": "AES_CYCLES",
"BriefDescription": "AES Cycles",
"PublicDescription": "Total number of CPU cycles when the DEA/AES coprocessor is busy performing the AES functions issued by the CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "78",
"EventName": "AES_BLOCKED_FUNCTIONS",
"BriefDescription": "AES Blocked Functions",
"PublicDescription": "Total number of AES functions that are issued by the CPU and are blocked because the DEA/AES coprocessor is busy performing a function issued by another CPU"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "79",
"EventName": "AES_BLOCKED_CYCLES",
"BriefDescription": "AES Blocked Cycles",
diff --git a/tools/perf/pmu-events/arch/s390/cf_zec12/extended.json b/tools/perf/pmu-events/arch/s390/cf_zec12/extended.json
index 8682126..1622510 100644
--- a/tools/perf/pmu-events/arch/s390/cf_zec12/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_zec12/extended.json
@@ -1,209 +1,244 @@
[
{
+ "Unit": "CPU-M-CF",
"EventCode": "128",
"EventName": "DTLB1_MISSES",
"BriefDescription": "DTLB1 Misses",
"PublicDescription": "Level-1 Data TLB miss in progress. Incremented by one for every cycle a DTLB1 miss is in progress."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "129",
"EventName": "ITLB1_MISSES",
"BriefDescription": "ITLB1 Misses",
"PublicDescription": "Level-1 Instruction TLB miss in progress. Incremented by one for every cycle a ITLB1 miss is in progress."
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "130",
"EventName": "L1D_L2I_SOURCED_WRITES",
"BriefDescription": "L1D L2I Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from the Level-2 Instruction cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "131",
"EventName": "L1I_L2I_SOURCED_WRITES",
"BriefDescription": "L1I L2I Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from the Level-2 Instruction cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "132",
"EventName": "L1D_L2D_SOURCED_WRITES",
"BriefDescription": "L1D L2D Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from the Level-2 Data cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "133",
"EventName": "DTLB1_WRITES",
"BriefDescription": "DTLB1 Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Data Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "135",
"EventName": "L1D_LMEM_SOURCED_WRITES",
"BriefDescription": "L1D Local Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache where the installed cache line was sourced from memory that is attached to the same book as the Data cache (Local Memory)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "137",
"EventName": "L1I_LMEM_SOURCED_WRITES",
"BriefDescription": "L1I Local Memory Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache where the installed cache line was sourced from memory that is attached to the same book as the Instruction cache (Local Memory)"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "138",
"EventName": "L1D_RO_EXCL_WRITES",
"BriefDescription": "L1D Read-only Exclusive Writes",
"PublicDescription": "A directory write to the Level-1 D-Cache where the line was originally in a Read-Only state in the cache but has been updated to be in the Exclusive state that allows stores to the cache line"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "139",
"EventName": "DTLB1_HPAGE_WRITES",
"BriefDescription": "DTLB1 One-Megabyte Page Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Data Translation Lookaside Buffer for a one-megabyte page"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "140",
"EventName": "ITLB1_WRITES",
"BriefDescription": "ITLB1 Writes",
"PublicDescription": "A translation entry has been written to the Level-1 Instruction Translation Lookaside Buffer"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "141",
"EventName": "TLB2_PTE_WRITES",
"BriefDescription": "TLB2 PTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Page Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "142",
"EventName": "TLB2_CRSTE_HPAGE_WRITES",
"BriefDescription": "TLB2 CRSTE One-Megabyte Page Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Common Region Segment Table Entry arrays for a one-megabyte large page translation"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "143",
"EventName": "TLB2_CRSTE_WRITES",
"BriefDescription": "TLB2 CRSTE Writes",
"PublicDescription": "A translation entry has been written to the Level-2 TLB Common Region Segment Table Entry arrays"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "144",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1D On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On Chip Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "145",
"EventName": "L1D_OFFCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off Chip/On Book Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "146",
"EventName": "L1D_OFFBOOK_L3_SOURCED_WRITES",
"BriefDescription": "L1D Off-Book L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off Book Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "147",
"EventName": "L1D_ONBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1D On-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an On Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "148",
"EventName": "L1D_OFFBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1D Off-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "149",
"EventName": "TX_NC_TEND",
"BriefDescription": "Completed TEND instructions in non-constrained TX mode",
"PublicDescription": "A TEND instruction has completed in a nonconstrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "150",
"EventName": "L1D_ONCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D On-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from a On Chip Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "151",
"EventName": "L1D_OFFCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D Off-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off Chip/On Book Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "152",
"EventName": "L1D_OFFBOOK_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1D Off-Book L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Data cache directory where the returned cache line was sourced from an Off Book Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "153",
"EventName": "L1I_ONCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1I On-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On Chip Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "154",
"EventName": "L1I_OFFCHIP_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Chip L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off Chip/On Book Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "155",
"EventName": "L1I_OFFBOOK_L3_SOURCED_WRITES",
"BriefDescription": "L1I Off-Book L3 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off Book Level-3 cache without intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "156",
"EventName": "L1I_ONBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1I On-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "157",
"EventName": "L1I_OFFBOOK_L4_SOURCED_WRITES",
"BriefDescription": "L1I Off-Book L4 Sourced Writes",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off Book Level-4 cache"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "158",
"EventName": "TX_C_TEND",
"BriefDescription": "Completed TEND instructions in constrained TX mode",
"PublicDescription": "A TEND instruction has completed in a constrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "159",
"EventName": "L1I_ONCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I On-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an On Chip Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "160",
"EventName": "L1I_OFFCHIP_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I Off-Chip L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off Chip/On Book Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "161",
"EventName": "L1I_OFFBOOK_L3_SOURCED_WRITES_IV",
"BriefDescription": "L1I Off-Book L3 Sourced Writes with Intervention",
"PublicDescription": "A directory write to the Level-1 Instruction cache directory where the returned cache line was sourced from an Off Book Level-3 cache with intervention"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "177",
"EventName": "TX_NC_TABORT",
"BriefDescription": "Aborted transactions in non-constrained TX mode",
"PublicDescription": "A transaction abort has occurred in a nonconstrained transactional-execution mode"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "178",
"EventName": "TX_C_TABORT_NO_SPECIAL",
"BriefDescription": "Aborted transactions in constrained TX mode not using special completion logic",
"PublicDescription": "A transaction abort has occurred in a constrained transactional-execution mode and the CPU is not using any special logic to allow the transaction to complete"
},
{
+ "Unit": "CPU-M-CF",
"EventCode": "179",
"EventName": "TX_C_TABORT_SPECIAL",
"BriefDescription": "Aborted transactions in constrained TX mode using special completion logic",
diff --git a/tools/perf/pmu-events/arch/s390/cf_zec12/transaction.json b/tools/perf/pmu-events/arch/s390/cf_zec12/transaction.json
new file mode 100644
index 0000000..1a0034f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/s390/cf_zec12/transaction.json
@@ -0,0 +1,7 @@
+[
+ {
+ "BriefDescription": "Transaction count",
+ "MetricName": "transaction",
+ "MetricExpr": "TX_C_TEND + TX_NC_TEND + TX_NC_TABORT + TX_C_TABORT_SPECIAL + TX_C_TABORT_NO_SPECIAL"
+ }
+]
diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index db3a594..68c92bb 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -233,6 +233,8 @@
{ "QPI LL", "uncore_qpi" },
{ "SBO", "uncore_sbox" },
{ "iMPH-U", "uncore_arb" },
+ { "CPU-M-CF", "cpum_cf" },
+ { "CPU-M-SF", "cpum_sf" },
{}
};
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index dd850a2..d7a5e1b 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -385,7 +385,7 @@
if (!t->subtest.get_nr)
pr_debug("%s:", t->desc);
else
- pr_debug("%s subtest %d:", t->desc, subtest);
+ pr_debug("%s subtest %d:", t->desc, subtest + 1);
switch (err) {
case TEST_OK:
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 6121191..3b97ac0 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1322,6 +1322,14 @@
return 0;
}
+static int test__checkevent_complex_name(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel = perf_evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong complex name parsing", strcmp(evsel->name, "COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks") == 0);
+ return 0;
+}
+
static int count_tracepoints(void)
{
struct dirent *events_ent;
@@ -1658,6 +1666,11 @@
.check = test__intel_pt,
.id = 52,
},
+ {
+ .name = "cycles/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks'/Duk",
+ .check = test__checkevent_complex_name,
+ .id = 53
+ }
};
static struct evlist_test test__events_pmu[] = {
@@ -1676,6 +1689,11 @@
.check = test__checkevent_pmu_partial_time_callgraph,
.id = 2,
},
+ {
+ .name = "cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp",
+ .check = test__checkevent_complex_name,
+ .id = 3,
+ }
};
struct terms_test {
diff --git a/tools/perf/tests/shell/record+probe_libc_inet_pton.sh b/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
index 94e513e..3013ac8 100755
--- a/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
+++ b/tools/perf/tests/shell/record+probe_libc_inet_pton.sh
@@ -13,11 +13,24 @@
libc=$(grep -w libc /proc/self/maps | head -1 | sed -r 's/.*[[:space:]](\/.*)/\1/g')
nm -Dg $libc 2>/dev/null | fgrep -q inet_pton || exit 254
+event_pattern='probe_libc:inet_pton(\_[[:digit:]]+)?'
+
+add_libc_inet_pton_event() {
+
+ event_name=$(perf probe -f -x $libc -a inet_pton 2>&1 | tail -n +2 | head -n -5 | \
+ grep -P -o "$event_pattern(?=[[:space:]]\(on inet_pton in $libc\))")
+
+ if [ $? -ne 0 -o -z "$event_name" ] ; then
+ printf "FAIL: could not add event\n"
+ return 1
+ fi
+}
+
trace_libc_inet_pton_backtrace() {
expected=`mktemp -u /tmp/expected.XXX`
- echo "ping[][0-9 \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\)" > $expected
+ echo "ping[][0-9 \.:]+$event_name: \([[:xdigit:]]+\)" > $expected
echo ".*inet_pton\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$" >> $expected
case "$(uname -m)" in
s390x)
@@ -26,6 +39,12 @@
echo "(__GI_)?getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc|inlined\)$" >> $expected
echo "main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" >> $expected
;;
+ ppc64|ppc64le)
+ eventattr='max-stack=4'
+ echo "gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\($libc\)$" >> $expected
+ echo "getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc\)$" >> $expected
+ echo ".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" >> $expected
+ ;;
*)
eventattr='max-stack=3'
echo "getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\($libc\)$" >> $expected
@@ -35,7 +54,7 @@
perf_data=`mktemp -u /tmp/perf.data.XXX`
perf_script=`mktemp -u /tmp/perf.script.XXX`
- perf record -e probe_libc:inet_pton/$eventattr/ -o $perf_data ping -6 -c 1 ::1 > /dev/null 2>&1
+ perf record -e $event_name/$eventattr/ -o $perf_data ping -6 -c 1 ::1 > /dev/null 2>&1
perf script -i $perf_data > $perf_script
exec 3<$perf_script
@@ -46,7 +65,7 @@
echo "$line" | egrep -q "$pattern"
if [ $? -ne 0 ] ; then
printf "FAIL: expected backtrace entry \"%s\" got \"%s\"\n" "$pattern" "$line"
- exit 1
+ return 1
fi
done
@@ -56,13 +75,20 @@
# even if the perf script output does not match.
}
+delete_libc_inet_pton_event() {
+
+ if [ -n "$event_name" ] ; then
+ perf probe -q -d $event_name
+ fi
+}
+
# Check for IPv6 interface existence
ip a sh lo | fgrep -q inet6 || exit 2
skip_if_no_perf_probe && \
-perf probe -q $libc inet_pton && \
+add_libc_inet_pton_event && \
trace_libc_inet_pton_backtrace
err=$?
rm -f ${perf_data} ${perf_script} ${expected}
-perf probe -q -d probe_libc:inet_pton
+delete_libc_inet_pton_event
exit $err
diff --git a/tools/perf/trace/beauty/Build b/tools/perf/trace/beauty/Build
index 66330d4..f528ba3 100644
--- a/tools/perf/trace/beauty/Build
+++ b/tools/perf/trace/beauty/Build
@@ -7,4 +7,5 @@
libperf-y += kcmp.o
libperf-y += pkey_alloc.o
libperf-y += prctl.o
+libperf-y += socket.o
libperf-y += statx.o
diff --git a/tools/perf/trace/beauty/beauty.h b/tools/perf/trace/beauty/beauty.h
index 984a504..9615af5 100644
--- a/tools/perf/trace/beauty/beauty.h
+++ b/tools/perf/trace/beauty/beauty.h
@@ -106,6 +106,9 @@
size_t syscall_arg__scnprintf_prctl_arg3(char *bf, size_t size, struct syscall_arg *arg);
#define SCA_PRCTL_ARG3 syscall_arg__scnprintf_prctl_arg3
+size_t syscall_arg__scnprintf_socket_protocol(char *bf, size_t size, struct syscall_arg *arg);
+#define SCA_SK_PROTO syscall_arg__scnprintf_socket_protocol
+
size_t syscall_arg__scnprintf_statx_flags(char *bf, size_t size, struct syscall_arg *arg);
#define SCA_STATX_FLAGS syscall_arg__scnprintf_statx_flags
diff --git a/tools/perf/trace/beauty/drm_ioctl.sh b/tools/perf/trace/beauty/drm_ioctl.sh
index 2149d3a..9d38168 100755
--- a/tools/perf/trace/beauty/drm_ioctl.sh
+++ b/tools/perf/trace/beauty/drm_ioctl.sh
@@ -1,13 +1,14 @@
#!/bin/sh
-drm_header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/drm/
+
printf "#ifndef DRM_COMMAND_BASE\n"
-grep "#define DRM_COMMAND_BASE" $drm_header_dir/drm.h
+grep "#define DRM_COMMAND_BASE" $header_dir/drm.h
printf "#endif\n"
printf "static const char *drm_ioctl_cmds[] = {\n"
-grep "^#define DRM_IOCTL.*DRM_IO" $drm_header_dir/drm.h | \
+grep "^#define DRM_IOCTL.*DRM_IO" $header_dir/drm.h | \
sed -r 's/^#define +DRM_IOCTL_([A-Z0-9_]+)[ ]+DRM_IO[A-Z]* *\( *(0x[[:xdigit:]]+),*.*/ [\2] = "\1",/g'
-grep "^#define DRM_I915_[A-Z_0-9]\+[ ]\+0x" $drm_header_dir/i915_drm.h | \
+grep "^#define DRM_I915_[A-Z_0-9]\+[ ]\+0x" $header_dir/i915_drm.h | \
sed -r 's/^#define +DRM_I915_([A-Z0-9_]+)[ ]+(0x[[:xdigit:]]+)/\t[DRM_COMMAND_BASE + \2] = "I915_\1",/g'
printf "};\n"
diff --git a/tools/perf/trace/beauty/kcmp_type.sh b/tools/perf/trace/beauty/kcmp_type.sh
index 40d063b8..a3c304c 100755
--- a/tools/perf/trace/beauty/kcmp_type.sh
+++ b/tools/perf/trace/beauty/kcmp_type.sh
@@ -1,6 +1,6 @@
#!/bin/sh
-header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/linux/
printf "static const char *kcmp_types[] = {\n"
regex='^[[:space:]]+(KCMP_(\w+)),'
diff --git a/tools/perf/trace/beauty/kvm_ioctl.sh b/tools/perf/trace/beauty/kvm_ioctl.sh
index bd28817..c4699fd 100755
--- a/tools/perf/trace/beauty/kvm_ioctl.sh
+++ b/tools/perf/trace/beauty/kvm_ioctl.sh
@@ -1,10 +1,10 @@
#!/bin/sh
-kvm_header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/linux/
printf "static const char *kvm_ioctl_cmds[] = {\n"
regex='^#[[:space:]]*define[[:space:]]+KVM_(\w+)[[:space:]]+_IO[RW]*\([[:space:]]*KVMIO[[:space:]]*,[[:space:]]*(0x[[:xdigit:]]+).*'
-egrep $regex ${kvm_header_dir}/kvm.h | \
+egrep $regex ${header_dir}/kvm.h | \
sed -r "s/$regex/\2 \1/g" | \
egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \
sort | xargs printf "\t[%s] = \"%s\",\n"
diff --git a/tools/perf/trace/beauty/madvise_behavior.sh b/tools/perf/trace/beauty/madvise_behavior.sh
index 60ef864..431639e 100755
--- a/tools/perf/trace/beauty/madvise_behavior.sh
+++ b/tools/perf/trace/beauty/madvise_behavior.sh
@@ -1,6 +1,6 @@
#!/bin/sh
-header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/asm-generic/
printf "static const char *madvise_advices[] = {\n"
regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+MADV_([[:alnum:]_]+)[[:space:]]+([[:digit:]]+)[[:space:]]*.*'
diff --git a/tools/perf/trace/beauty/perf_ioctl.sh b/tools/perf/trace/beauty/perf_ioctl.sh
index faea423..6492c74 100755
--- a/tools/perf/trace/beauty/perf_ioctl.sh
+++ b/tools/perf/trace/beauty/perf_ioctl.sh
@@ -1,6 +1,6 @@
#!/bin/sh
-header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/linux/
printf "static const char *perf_ioctl_cmds[] = {\n"
regex='^#[[:space:]]*define[[:space:]]+PERF_EVENT_IOC_(\w+)[[:space:]]+_IO[RW]*[[:space:]]*\([[:space:]]*.\$.[[:space:]]*,[[:space:]]*([[:digit:]]+).*'
diff --git a/tools/perf/trace/beauty/pkey_alloc_access_rights.sh b/tools/perf/trace/beauty/pkey_alloc_access_rights.sh
index 62e51a0..e0a51ae 100755
--- a/tools/perf/trace/beauty/pkey_alloc_access_rights.sh
+++ b/tools/perf/trace/beauty/pkey_alloc_access_rights.sh
@@ -1,6 +1,6 @@
#!/bin/sh
-header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/asm-generic/
printf "static const char *pkey_alloc_access_rights[] = {\n"
regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+PKEY_([[:alnum:]_]+)[[:space:]]+(0x[[:xdigit:]]+)[[:space:]]*'
diff --git a/tools/perf/trace/beauty/sndrv_ctl_ioctl.sh b/tools/perf/trace/beauty/sndrv_ctl_ioctl.sh
index aad5ab1..eb511bb 100755
--- a/tools/perf/trace/beauty/sndrv_ctl_ioctl.sh
+++ b/tools/perf/trace/beauty/sndrv_ctl_ioctl.sh
@@ -1,8 +1,8 @@
#!/bin/sh
-sound_header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/sound/
printf "static const char *sndrv_ctl_ioctl_cmds[] = {\n"
-grep "^#define[\t ]\+SNDRV_CTL_IOCTL_" $sound_header_dir/asound.h | \
+grep "^#define[\t ]\+SNDRV_CTL_IOCTL_" $header_dir/asound.h | \
sed -r 's/^#define +SNDRV_CTL_IOCTL_([A-Z0-9_]+)[\t ]+_IO[RW]*\( *.U., *(0x[[:xdigit:]]+),?.*/\t[\2] = \"\1\",/g'
printf "};\n"
diff --git a/tools/perf/trace/beauty/sndrv_pcm_ioctl.sh b/tools/perf/trace/beauty/sndrv_pcm_ioctl.sh
index b7e9ef6..6818392 100755
--- a/tools/perf/trace/beauty/sndrv_pcm_ioctl.sh
+++ b/tools/perf/trace/beauty/sndrv_pcm_ioctl.sh
@@ -1,8 +1,8 @@
#!/bin/sh
-sound_header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/sound/
printf "static const char *sndrv_pcm_ioctl_cmds[] = {\n"
-grep "^#define[\t ]\+SNDRV_PCM_IOCTL_" $sound_header_dir/asound.h | \
+grep "^#define[\t ]\+SNDRV_PCM_IOCTL_" $header_dir/asound.h | \
sed -r 's/^#define +SNDRV_PCM_IOCTL_([A-Z0-9_]+)[\t ]+_IO[RW]*\( *.A., *(0x[[:xdigit:]]+),?.*/\t[\2] = \"\1\",/g'
printf "};\n"
diff --git a/tools/perf/trace/beauty/socket.c b/tools/perf/trace/beauty/socket.c
new file mode 100644
index 0000000..65227269
--- /dev/null
+++ b/tools/perf/trace/beauty/socket.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace/beauty/socket.c
+ *
+ * Copyright (C) 2018, Red Hat Inc, Arnaldo Carvalho de Melo <acme@redhat.com>
+ */
+
+#include "trace/beauty/beauty.h"
+#include <sys/types.h>
+#include <sys/socket.h>
+
+static size_t socket__scnprintf_ipproto(int protocol, char *bf, size_t size)
+{
+#include "trace/beauty/generated/socket_ipproto_array.c"
+ static DEFINE_STRARRAY(socket_ipproto);
+
+ return strarray__scnprintf(&strarray__socket_ipproto, bf, size, "%d", protocol);
+}
+
+size_t syscall_arg__scnprintf_socket_protocol(char *bf, size_t size, struct syscall_arg *arg)
+{
+ int domain = syscall_arg__val(arg, 0);
+
+ if (domain == AF_INET || domain == AF_INET6)
+ return socket__scnprintf_ipproto(arg->val, bf, size);
+
+ return syscall_arg__scnprintf_int(bf, size, arg);
+}
diff --git a/tools/perf/trace/beauty/socket_ipproto.sh b/tools/perf/trace/beauty/socket_ipproto.sh
new file mode 100755
index 0000000..a3cc246
--- /dev/null
+++ b/tools/perf/trace/beauty/socket_ipproto.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/linux/
+
+printf "static const char *socket_ipproto[] = {\n"
+regex='^[[:space:]]+IPPROTO_(\w+)[[:space:]]+=[[:space:]]+([[:digit:]]+),.*'
+
+egrep $regex ${header_dir}/in.h | \
+ sed -r "s/$regex/\2 \1/g" | \
+ sort | xargs printf "\t[%s] = \"%s\",\n"
+printf "};\n"
diff --git a/tools/perf/trace/beauty/vhost_virtio_ioctl.sh b/tools/perf/trace/beauty/vhost_virtio_ioctl.sh
index 76f1de6..0f6a519 100755
--- a/tools/perf/trace/beauty/vhost_virtio_ioctl.sh
+++ b/tools/perf/trace/beauty/vhost_virtio_ioctl.sh
@@ -1,17 +1,17 @@
#!/bin/sh
-vhost_virtio_header_dir=$1
+[ $# -eq 1 ] && header_dir=$1 || header_dir=tools/include/uapi/linux/
printf "static const char *vhost_virtio_ioctl_cmds[] = {\n"
regex='^#[[:space:]]*define[[:space:]]+VHOST_(\w+)[[:space:]]+_IOW?\([[:space:]]*VHOST_VIRTIO[[:space:]]*,[[:space:]]*(0x[[:xdigit:]]+).*'
-egrep $regex ${vhost_virtio_header_dir}/vhost.h | \
+egrep $regex ${header_dir}/vhost.h | \
sed -r "s/$regex/\2 \1/g" | \
sort | xargs printf "\t[%s] = \"%s\",\n"
printf "};\n"
printf "static const char *vhost_virtio_ioctl_read_cmds[] = {\n"
regex='^#[[:space:]]*define[[:space:]]+VHOST_(\w+)[[:space:]]+_IOW?R\([[:space:]]*VHOST_VIRTIO[[:space:]]*,[[:space:]]*(0x[[:xdigit:]]+).*'
-egrep $regex ${vhost_virtio_header_dir}/vhost.h | \
+egrep $regex ${header_dir}/vhost.h | \
sed -r "s/$regex/\2 \1/g" | \
sort | xargs printf "\t[%s] = \"%s\",\n"
printf "};\n"
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 69b7a28..74c4ae1 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -529,7 +529,7 @@
static int hist_entry__fprintf(struct hist_entry *he, size_t size,
char *bf, size_t bfsz, FILE *fp,
- bool use_callchain)
+ bool ignore_callchains)
{
int ret;
int callchain_ret = 0;
@@ -550,7 +550,7 @@
ret = fprintf(fp, "%s\n", bf);
- if (hist_entry__has_callchains(he) && use_callchain)
+ if (hist_entry__has_callchains(he) && !ignore_callchains)
callchain_ret = hist_entry_callchain__fprintf(he, total_period,
0, fp);
@@ -755,7 +755,7 @@
size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
int max_cols, float min_pcnt, FILE *fp,
- bool use_callchain)
+ bool ignore_callchains)
{
struct rb_node *nd;
size_t ret = 0;
@@ -799,7 +799,7 @@
if (percent < min_pcnt)
continue;
- ret += hist_entry__fprintf(h, max_cols, line, linesz, fp, use_callchain);
+ ret += hist_entry__fprintf(h, max_cols, line, linesz, fp, ignore_callchains);
if (max_rows && ++nr_rows >= max_rows)
break;
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index cee6587..3d02ae3 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -747,7 +747,9 @@
err = bpf_object__load(obj);
if (err) {
- pr_debug("bpf: load objects failed\n");
+ char bf[128];
+ libbpf_strerror(err, bf, sizeof(bf));
+ pr_debug("bpf: load objects failed: err=%d: (%s)\n", err, bf);
return err;
}
return 0;
diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
index 7798a2c..31279a7 100644
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
@@ -20,9 +20,10 @@
static struct comm_str *comm_str__get(struct comm_str *cs)
{
- if (cs)
- refcount_inc(&cs->refcnt);
- return cs;
+ if (cs && refcount_inc_not_zero(&cs->refcnt))
+ return cs;
+
+ return NULL;
}
static void comm_str__put(struct comm_str *cs)
@@ -67,9 +68,14 @@
parent = *p;
iter = rb_entry(parent, struct comm_str, rb_node);
+ /*
+ * If we race with comm_str__put, iter->refcnt is 0
+ * and it will be removed within comm_str__put call
+ * shortly, ignore it in this search.
+ */
cmp = strcmp(str, iter->str);
- if (!cmp)
- return comm_str__get(iter);
+ if (!cmp && comm_str__get(iter))
+ return iter;
if (cmp < 0)
p = &(*p)->rb_left;
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 4d5fc37..938def6 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -31,6 +31,8 @@
#endif
#endif
+#define CS_ETM_INVAL_ADDR 0xdeadbeefdeadbeefUL
+
struct cs_etm_decoder {
void *data;
void (*packet_printer)(const char *msg);
@@ -261,8 +263,8 @@
decoder->tail = 0;
decoder->packet_count = 0;
for (i = 0; i < MAX_BUFFER; i++) {
- decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
- decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL;
+ decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
+ decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[i].last_instr_taken_branch = false;
decoder->packet_buffer[i].exc = false;
decoder->packet_buffer[i].exc_ret = false;
@@ -295,8 +297,8 @@
decoder->packet_buffer[et].exc = false;
decoder->packet_buffer[et].exc_ret = false;
decoder->packet_buffer[et].cpu = *((int *)inode->priv);
- decoder->packet_buffer[et].start_addr = 0xdeadbeefdeadbeefUL;
- decoder->packet_buffer[et].end_addr = 0xdeadbeefdeadbeefUL;
+ decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
+ decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
if (decoder->packet_count == MAX_BUFFER - 1)
return OCSD_RESP_WAIT;
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 743f5f4..612b575 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -23,6 +23,7 @@
};
enum cs_etm_sample_type {
+ CS_ETM_EMPTY = 0,
CS_ETM_RANGE = 1 << 0,
CS_ETM_TRACE_ON = 1 << 1,
};
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..2ae6402 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -494,6 +494,10 @@
static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
{
+ /* Returns 0 for the CS_ETM_TRACE_ON packet */
+ if (packet->sample_type == CS_ETM_TRACE_ON)
+ return 0;
+
/*
* The packet records the execution range with an exclusive end address
*
@@ -505,6 +509,15 @@
return packet->end_addr - A64_INSTR_SIZE;
}
+static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
+{
+ /* Returns 0 for the CS_ETM_TRACE_ON packet */
+ if (packet->sample_type == CS_ETM_TRACE_ON)
+ return 0;
+
+ return packet->start_addr;
+}
+
static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
{
/*
@@ -546,7 +559,7 @@
be = &bs->entries[etmq->last_branch_pos];
be->from = cs_etm__last_executed_instr(etmq->prev_packet);
- be->to = etmq->packet->start_addr;
+ be->to = cs_etm__first_executed_instr(etmq->packet);
/* No support for mispredict */
be->flags.mispred = 0;
be->flags.predicted = 1;
@@ -701,7 +714,7 @@
sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
sample.pid = etmq->pid;
sample.tid = etmq->tid;
- sample.addr = etmq->packet->start_addr;
+ sample.addr = cs_etm__first_executed_instr(etmq->packet);
sample.id = etmq->etm->branches_id;
sample.stream_id = etmq->etm->branches_id;
sample.period = 1;
@@ -897,13 +910,23 @@
etmq->period_instructions = instrs_over;
}
- if (etm->sample_branches &&
- etmq->prev_packet &&
- etmq->prev_packet->sample_type == CS_ETM_RANGE &&
- etmq->prev_packet->last_instr_taken_branch) {
- ret = cs_etm__synth_branch_sample(etmq);
- if (ret)
- return ret;
+ if (etm->sample_branches && etmq->prev_packet) {
+ bool generate_sample = false;
+
+ /* Generate sample for tracing on packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
+ generate_sample = true;
+
+ /* Generate sample for branch taken packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+ etmq->prev_packet->last_instr_taken_branch)
+ generate_sample = true;
+
+ if (generate_sample) {
+ ret = cs_etm__synth_branch_sample(etmq);
+ if (ret)
+ return ret;
+ }
}
if (etm->sample_branches || etm->synth_opts.last_branch) {
@@ -922,10 +945,17 @@
static int cs_etm__flush(struct cs_etm_queue *etmq)
{
int err = 0;
+ struct cs_etm_auxtrace *etm = etmq->etm;
struct cs_etm_packet *tmp;
+ if (!etmq->prev_packet)
+ return 0;
+
+ /* Handle start tracing packet */
+ if (etmq->prev_packet->sample_type == CS_ETM_EMPTY)
+ goto swap_packet;
+
if (etmq->etm->synth_opts.last_branch &&
- etmq->prev_packet &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
/*
* Generate a last branch event for the branches left in the
@@ -939,8 +969,22 @@
err = cs_etm__synth_instruction_sample(
etmq, addr,
etmq->period_instructions);
+ if (err)
+ return err;
+
etmq->period_instructions = 0;
+ }
+
+ if (etm->sample_branches &&
+ etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+ err = cs_etm__synth_branch_sample(etmq);
+ if (err)
+ return err;
+ }
+
+swap_packet:
+ if (etmq->etm->synth_opts.last_branch) {
/*
* Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
* the next incoming packet.
@@ -1020,6 +1064,13 @@
*/
cs_etm__flush(etmq);
break;
+ case CS_ETM_EMPTY:
+ /*
+ * Should not receive empty packet,
+ * report error.
+ */
+ pr_err("CS ETM Trace: empty packet\n");
+ return -EINVAL;
default:
break;
}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 94fce4f..ddf84b9 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -260,6 +260,17 @@
evsel->attr.sample_period = 1;
}
+ if (perf_evsel__is_clock(evsel)) {
+ /*
+ * The evsel->unit points to static alias->unit
+ * so it's ok to use static string in here.
+ */
+ static const char *unit = "msec";
+
+ evsel->unit = unit;
+ evsel->scale = 1e-6;
+ }
+
return evsel;
}
@@ -848,6 +859,12 @@
}
}
+static bool is_dummy_event(struct perf_evsel *evsel)
+{
+ return (evsel->attr.type == PERF_TYPE_SOFTWARE) &&
+ (evsel->attr.config == PERF_COUNT_SW_DUMMY);
+}
+
/*
* The enable_on_exec/disabled value strategy:
*
@@ -1086,6 +1103,14 @@
else
perf_evsel__reset_sample_bit(evsel, PERIOD);
}
+
+ /*
+ * For initial_delay, a dummy event is added implicitly.
+ * The software event will trigger -EOPNOTSUPP error out,
+ * if BRANCH_STACK bit is set.
+ */
+ if (opts->initial_delay && is_dummy_event(evsel))
+ perf_evsel__reset_sample_bit(evsel, BRANCH_STACK);
}
static int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index d277930..973c031 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -402,10 +402,13 @@
static inline bool perf_evsel__is_bpf_output(struct perf_evsel *evsel)
{
- struct perf_event_attr *attr = &evsel->attr;
+ return perf_evsel__match(evsel, SOFTWARE, SW_BPF_OUTPUT);
+}
- return (attr->config == PERF_COUNT_SW_BPF_OUTPUT) &&
- (attr->type == PERF_TYPE_SOFTWARE);
+static inline bool perf_evsel__is_clock(struct perf_evsel *evsel)
+{
+ return perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
+ perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK);
}
struct perf_attr_details {
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 653ff65..5af58aa 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2587,7 +2587,7 @@
FEAT_OPR(NUMA_TOPOLOGY, numa_topology, true),
FEAT_OPN(BRANCH_STACK, branch_stack, false),
FEAT_OPR(PMU_MAPPINGS, pmu_mappings, false),
- FEAT_OPN(GROUP_DESC, group_desc, false),
+ FEAT_OPR(GROUP_DESC, group_desc, false),
FEAT_OPN(AUXTRACE, auxtrace, false),
FEAT_OPN(STAT, stat, false),
FEAT_OPN(CACHE, cache, true),
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 73049f7..3badd7f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -181,7 +181,7 @@
size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
int max_cols, float min_pcnt, FILE *fp,
- bool use_callchain);
+ bool ignore_callchains);
size_t perf_evlist__fprintf_nr_events(struct perf_evlist *evlist, FILE *fp);
void hists__filter_by_dso(struct hists *hists);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e7b4a8b..b300a39 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -408,6 +408,55 @@
}
/*
+ * Front-end cache - TID lookups come in blocks,
+ * so most of the time we dont have to look up
+ * the full rbtree:
+ */
+static struct thread*
+__threads__get_last_match(struct threads *threads, struct machine *machine,
+ int pid, int tid)
+{
+ struct thread *th;
+
+ th = threads->last_match;
+ if (th != NULL) {
+ if (th->tid == tid) {
+ machine__update_thread_pid(machine, th, pid);
+ return thread__get(th);
+ }
+
+ threads->last_match = NULL;
+ }
+
+ return NULL;
+}
+
+static struct thread*
+threads__get_last_match(struct threads *threads, struct machine *machine,
+ int pid, int tid)
+{
+ struct thread *th = NULL;
+
+ if (perf_singlethreaded)
+ th = __threads__get_last_match(threads, machine, pid, tid);
+
+ return th;
+}
+
+static void
+__threads__set_last_match(struct threads *threads, struct thread *th)
+{
+ threads->last_match = th;
+}
+
+static void
+threads__set_last_match(struct threads *threads, struct thread *th)
+{
+ if (perf_singlethreaded)
+ __threads__set_last_match(threads, th);
+}
+
+/*
* Caller must eventually drop thread->refcnt returned with a successful
* lookup/new thread inserted.
*/
@@ -420,27 +469,16 @@
struct rb_node *parent = NULL;
struct thread *th;
- /*
- * Front-end cache - TID lookups come in blocks,
- * so most of the time we dont have to look up
- * the full rbtree:
- */
- th = threads->last_match;
- if (th != NULL) {
- if (th->tid == tid) {
- machine__update_thread_pid(machine, th, pid);
- return thread__get(th);
- }
-
- threads->last_match = NULL;
- }
+ th = threads__get_last_match(threads, machine, pid, tid);
+ if (th)
+ return th;
while (*p != NULL) {
parent = *p;
th = rb_entry(parent, struct thread, rb_node);
if (th->tid == tid) {
- threads->last_match = th;
+ threads__set_last_match(threads, th);
machine__update_thread_pid(machine, th, pid);
return thread__get(th);
}
@@ -477,7 +515,7 @@
* It is now in the rbtree, get a ref
*/
thread__get(th);
- threads->last_match = th;
+ threads__set_last_match(threads, th);
++threads->nr;
}
@@ -1635,7 +1673,7 @@
struct threads *threads = machine__threads(machine, th->tid);
if (threads->last_match == th)
- threads->last_match = NULL;
+ threads__set_last_match(threads, NULL);
BUG_ON(refcount_read(&th->refcnt) == 0);
if (lock)
@@ -2272,6 +2310,7 @@
{
struct callchain_cursor *cursor = arg;
const char *srcline = NULL;
+ u64 addr;
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
@@ -2279,7 +2318,13 @@
if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
return 0;
- srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
+ /*
+ * Convert entry->ip from a virtual address to an offset in
+ * its corresponding binary.
+ */
+ addr = map__map_ip(entry->map, entry->ip);
+
+ srcline = callchain_srcline(entry->map, entry->sym, addr);
return callchain_cursor_append(cursor, entry->ip,
entry->map, entry->sym,
false, NULL, 0, 0, 0, srcline);
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 1ddc3d1..a28f9b5 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -326,8 +326,8 @@
if (raw)
s = (char *)pe->metric_name;
else {
- if (asprintf(&s, "%s\n\t[%s]",
- pe->metric_name, pe->desc) < 0)
+ if (asprintf(&s, "%s\n%*s%s]",
+ pe->metric_name, 8, "[", pe->desc) < 0)
return;
}
@@ -490,3 +490,25 @@
metricgroup__free_egroups(&group_list);
return ret;
}
+
+bool metricgroup__has_metric(const char *metric)
+{
+ struct pmu_events_map *map = perf_pmu__find_map(NULL);
+ struct pmu_event *pe;
+ int i;
+
+ if (!map)
+ return false;
+
+ for (i = 0; ; i++) {
+ pe = &map->table[i];
+
+ if (!pe->name && !pe->metric_group && !pe->metric_name)
+ break;
+ if (!pe->metric_expr)
+ continue;
+ if (match_metric(pe->metric_name, metric))
+ return true;
+ }
+ return false;
+}
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 06854e1..8a155db 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -28,4 +28,5 @@
struct rblist *metric_events);
void metricgroup__print(bool metrics, bool groups, char *filter, bool raw);
+bool metricgroup__has_metric(const char *metric);
#endif
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 3ba6a17..afd6852 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -652,12 +652,6 @@
if (stat(path, &st) == 0)
return 1;
- /* Look for cpu sysfs (specific to s390) */
- scnprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s",
- sysfs, name);
- if (stat(path, &st) == 0 && !strncmp(name, "cpum_", 5))
- return 1;
-
return 0;
}
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 594d14a..99990f5 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -913,11 +913,10 @@
ratio = total / avg;
print_metric(ctxp, NULL, "%8.0f", "cycles / elision", ratio);
- } else if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK) ||
- perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK)) {
+ } else if (perf_evsel__is_clock(evsel)) {
if ((ratio = avg_stats(&walltime_nsecs_stats)) != 0)
print_metric(ctxp, NULL, "%8.3f", "CPUs utilized",
- avg / ratio);
+ avg / (ratio * evsel->scale));
else
print_metric(ctxp, NULL, NULL, "CPUs utilized", 0);
} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
diff --git a/tools/perf/util/syscalltbl.c b/tools/perf/util/syscalltbl.c
index 0ee7f56..3393d7e 100644
--- a/tools/perf/util/syscalltbl.c
+++ b/tools/perf/util/syscalltbl.c
@@ -38,6 +38,10 @@
#include <asm/syscalls_32.c>
const int syscalltbl_native_max_id = SYSCALLTBL_POWERPC_32_MAX_ID;
static const char **syscalltbl_native = syscalltbl_powerpc_32;
+#elif defined(__aarch64__)
+#include <asm/syscalls.c>
+const int syscalltbl_native_max_id = SYSCALLTBL_ARM64_MAX_ID;
+static const char **syscalltbl_native = syscalltbl_arm64;
#endif
struct syscall {
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 538db4e..6f318b1 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -77,7 +77,7 @@
if (__report_module(&al, ip, ui))
return -1;
- e->ip = al.addr;
+ e->ip = ip;
e->map = al.map;
e->sym = al.sym;
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index 6a11bc7..79f521a 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -575,7 +575,7 @@
struct addr_location al;
e.sym = thread__find_symbol(thread, PERF_RECORD_MISC_USER, ip, &al);
- e.ip = al.addr;
+ e.ip = ip;
e.map = al.map;
pr_debug("unwind: %s:ip = 0x%" PRIx64 " (0x%" PRIx64 ")\n",
diff --git a/tools/testing/selftests/rcutorture/bin/configinit.sh b/tools/testing/selftests/rcutorture/bin/configinit.sh
index c15f270..65541c2 100755
--- a/tools/testing/selftests/rcutorture/bin/configinit.sh
+++ b/tools/testing/selftests/rcutorture/bin/configinit.sh
@@ -1,6 +1,6 @@
#!/bin/bash
#
-# Usage: configinit.sh config-spec-file [ build output dir ]
+# Usage: configinit.sh config-spec-file build-output-dir results-dir
#
# Create a .config file from the spec file. Run from the kernel source tree.
# Exits with 0 if all went well, with 1 if all went well but the config
@@ -40,20 +40,18 @@
c=$1
buildloc=$2
+resdir=$3
builddir=
-if test -n $buildloc
+if echo $buildloc | grep -q '^O='
then
- if echo $buildloc | grep -q '^O='
+ builddir=`echo $buildloc | sed -e 's/^O=//'`
+ if test ! -d $builddir
then
- builddir=`echo $buildloc | sed -e 's/^O=//'`
- if test ! -d $builddir
- then
- mkdir $builddir
- fi
- else
- echo Bad build directory: \"$buildloc\"
- exit 2
+ mkdir $builddir
fi
+else
+ echo Bad build directory: \"$buildloc\"
+ exit 2
fi
sed -e 's/^\(CONFIG[0-9A-Z_]*\)=.*$/grep -v "^# \1" |/' < $c > $T/u.sh
@@ -61,12 +59,12 @@
grep '^grep' < $T/u.sh > $T/upd.sh
echo "cat - $c" >> $T/upd.sh
make mrproper
-make $buildloc distclean > $builddir/Make.distclean 2>&1
-make $buildloc $TORTURE_DEFCONFIG > $builddir/Make.defconfig.out 2>&1
+make $buildloc distclean > $resdir/Make.distclean 2>&1
+make $buildloc $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1
mv $builddir/.config $builddir/.config.sav
sh $T/upd.sh < $builddir/.config.sav > $builddir/.config
cp $builddir/.config $builddir/.config.new
-yes '' | make $buildloc oldconfig > $builddir/Make.oldconfig.out 2> $builddir/Make.oldconfig.err
+yes '' | make $buildloc oldconfig > $resdir/Make.oldconfig.out 2> $resdir/Make.oldconfig.err
# verify new config matches specification.
configcheck.sh $builddir/.config $c
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-build.sh b/tools/testing/selftests/rcutorture/bin/kvm-build.sh
index 34d1267..9115fcd 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-build.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-build.sh
@@ -2,7 +2,7 @@
#
# Build a kvm-ready Linux kernel from the tree in the current directory.
#
-# Usage: kvm-build.sh config-template build-dir
+# Usage: kvm-build.sh config-template build-dir resdir
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -29,6 +29,7 @@
exit 1
fi
builddir=${2}
+resdir=${3}
T=${TMPDIR-/tmp}/test-linux.sh.$$
trap 'rm -rf $T' 0
@@ -41,19 +42,19 @@
CONFIG_VIRTIO_CONSOLE=y
___EOF___
-configinit.sh $T/config O=$builddir
+configinit.sh $T/config O=$builddir $resdir
retval=$?
if test $retval -gt 1
then
exit 2
fi
ncpus=`cpus2use.sh`
-make O=$builddir -j$ncpus $TORTURE_KMAKE_ARG > $builddir/Make.out 2>&1
+make O=$builddir -j$ncpus $TORTURE_KMAKE_ARG > $resdir/Make.out 2>&1
retval=$?
-if test $retval -ne 0 || grep "rcu[^/]*": < $builddir/Make.out | egrep -q "Stop|Error|error:|warning:" || egrep -q "Stop|Error|error:" < $builddir/Make.out
+if test $retval -ne 0 || grep "rcu[^/]*": < $resdir/Make.out | egrep -q "Stop|Error|error:|warning:" || egrep -q "Stop|Error|error:" < $resdir/Make.out
then
echo Kernel build error
- egrep "Stop|Error|error:|warning:" < $builddir/Make.out
+ egrep "Stop|Error|error:|warning:" < $resdir/Make.out
echo Run aborted.
exit 3
fi
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh
index 477ecb1..0fa8a61 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh
@@ -70,4 +70,5 @@
else
print_warning $nclosecalls "Reader Batch close calls in" $(($dur/60)) minute run: $i
fi
+ echo $nclosecalls "Reader Batch close calls in" $(($dur/60)) minute run: $i > $i/console.log.rcu.diags
fi
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
index c27e978..c9bab57 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
@@ -39,6 +39,7 @@
head -1 $resdir/log
fi
TORTURE_SUITE="`cat $i/../TORTURE_SUITE`"
+ rm -f $i/console.log.*.diags
kvm-recheck-${TORTURE_SUITE}.sh $i
if test -f "$i/console.log"
then
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index c5b0f94..f7247ee 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -98,14 +98,15 @@
ln -s $base_resdir/.config $resdir # for kvm-recheck.sh
# Arch-independent indicator
touch $resdir/builtkernel
-elif kvm-build.sh $T/Kc2 $builddir
+elif kvm-build.sh $T/Kc2 $builddir $resdir
then
# Had to build a kernel for this test.
QEMU="`identify_qemu $builddir/vmlinux`"
BOOT_IMAGE="`identify_boot_image $QEMU`"
- cp $builddir/Make*.out $resdir
cp $builddir/vmlinux $resdir
cp $builddir/.config $resdir
+ cp $builddir/Module.symvers $resdir > /dev/null || :
+ cp $builddir/System.map $resdir > /dev/null || :
if test -n "$BOOT_IMAGE"
then
cp $builddir/$BOOT_IMAGE $resdir
diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh
index 56610db..5a7a62d 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm.sh
@@ -347,7 +347,7 @@
print "needqemurun="
jn=1
for (j = first; j < pastlast; j++) {
- builddir=KVM "/b" jn
+ builddir=KVM "/b1"
cpusr[jn] = cpus[j];
if (cfrep[cf[j]] == "") {
cfr[jn] = cf[j];
diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh
index 1729343..84933f6 100755
--- a/tools/testing/selftests/rcutorture/bin/parse-console.sh
+++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh
@@ -163,6 +163,13 @@
print_warning Summary: $summary
cat $T.diags >> $file.diags
fi
+for i in $file.*.diags
+do
+ if test -f "$i"
+ then
+ cat $i >> $file.diags
+ fi
+done
if ! test -s $file.diags
then
rm -f $file.diags
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
index 5d2cc0b..5c3213c 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
@@ -1,5 +1,5 @@
-rcutorture.onoff_interval=1 rcutorture.onoff_holdoff=30
-rcutree.gp_preinit_delay=3
+rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
+rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T.boot
deleted file mode 100644
index 883149b..0000000
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T.boot
+++ /dev/null
@@ -1 +0,0 @@
-rcutree.rcu_fanout_exact=1
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh b/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh
index 24ec910..7bab824 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh
+++ b/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh
@@ -39,7 +39,7 @@
if ! bootparam_hotplug_cpu "$1" && configfrag_hotplug_cpu "$2"
then
echo CPU-hotplug kernel, adding rcutorture onoff. 1>&2
- echo rcutorture.onoff_interval=3 rcutorture.onoff_holdoff=30
+ echo rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30
fi
}
diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c
index 6152523..91cf50b 100644
--- a/tools/testing/selftests/rseq/param_test.c
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -90,6 +90,30 @@
#error "Unsupported architecture"
#endif
+#elif defined(__s390__)
+
+#define RSEQ_INJECT_INPUT \
+ , [loop_cnt_1]"m"(loop_cnt[1]) \
+ , [loop_cnt_2]"m"(loop_cnt[2]) \
+ , [loop_cnt_3]"m"(loop_cnt[3]) \
+ , [loop_cnt_4]"m"(loop_cnt[4]) \
+ , [loop_cnt_5]"m"(loop_cnt[5]) \
+ , [loop_cnt_6]"m"(loop_cnt[6])
+
+#define INJECT_ASM_REG "r12"
+
+#define RSEQ_INJECT_CLOBBER \
+ , INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+ "l %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+ "ltr %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG "\n\t" \
+ "je 333f\n\t" \
+ "222:\n\t" \
+ "ahi %%" INJECT_ASM_REG ", -1\n\t" \
+ "jnz 222b\n\t" \
+ "333:\n\t"
+
#elif defined(__ARMEL__)
#define RSEQ_INJECT_INPUT \
diff --git a/tools/testing/selftests/rseq/rseq-s390.h b/tools/testing/selftests/rseq/rseq-s390.h
new file mode 100644
index 0000000..1069e85
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-s390.h
@@ -0,0 +1,513 @@
+/* SPDX-License-Identifier: LGPL-2.1 OR MIT */
+
+#define RSEQ_SIG 0x53053053
+
+#define rseq_smp_mb() __asm__ __volatile__ ("bcr 15,0" ::: "memory")
+#define rseq_smp_rmb() rseq_smp_mb()
+#define rseq_smp_wmb() rseq_smp_mb()
+
+#define rseq_smp_load_acquire(p) \
+__extension__ ({ \
+ __typeof(*p) ____p1 = RSEQ_READ_ONCE(*p); \
+ rseq_barrier(); \
+ ____p1; \
+})
+
+#define rseq_smp_acquire__after_ctrl_dep() rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v) \
+do { \
+ rseq_barrier(); \
+ RSEQ_WRITE_ONCE(*p, v); \
+} while (0)
+
+#ifdef RSEQ_SKIP_FASTPATH
+#include "rseq-skip.h"
+#else /* !RSEQ_SKIP_FASTPATH */
+
+#ifdef __s390x__
+
+#define LONG_L "lg"
+#define LONG_S "stg"
+#define LONG_LT_R "ltgr"
+#define LONG_CMP "cg"
+#define LONG_CMP_R "cgr"
+#define LONG_ADDI "aghi"
+#define LONG_ADD_R "agr"
+
+#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \
+ start_ip, post_commit_offset, abort_ip) \
+ ".pushsection __rseq_table, \"aw\"\n\t" \
+ ".balign 32\n\t" \
+ __rseq_str(label) ":\n\t" \
+ ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+ ".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+ ".popsection\n\t"
+
+#elif __s390__
+
+#define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, \
+ start_ip, post_commit_offset, abort_ip) \
+ ".pushsection __rseq_table, \"aw\"\n\t" \
+ ".balign 32\n\t" \
+ __rseq_str(label) ":\n\t" \
+ ".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+ ".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \
+ ".popsection\n\t"
+
+#define LONG_L "l"
+#define LONG_S "st"
+#define LONG_LT_R "ltr"
+#define LONG_CMP "c"
+#define LONG_CMP_R "cr"
+#define LONG_ADDI "ahi"
+#define LONG_ADD_R "ar"
+
+#endif
+
+#define RSEQ_ASM_DEFINE_TABLE(label, start_ip, post_commit_ip, abort_ip) \
+ __RSEQ_ASM_DEFINE_TABLE(label, 0x0, 0x0, start_ip, \
+ (post_commit_ip - start_ip), abort_ip)
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs) \
+ RSEQ_INJECT_ASM(1) \
+ "larl %%r0, " __rseq_str(cs_label) "\n\t" \
+ LONG_S " %%r0, %[" __rseq_str(rseq_cs) "]\n\t" \
+ __rseq_str(label) ":\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) \
+ RSEQ_INJECT_ASM(2) \
+ "c %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+ "jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \
+ ".pushsection __rseq_failure, \"ax\"\n\t" \
+ ".long " __rseq_str(RSEQ_SIG) "\n\t" \
+ __rseq_str(label) ":\n\t" \
+ teardown \
+ "j %l[" __rseq_str(abort_label) "]\n\t" \
+ ".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \
+ ".pushsection __rseq_failure, \"ax\"\n\t" \
+ __rseq_str(label) ":\n\t" \
+ teardown \
+ "j %l[" __rseq_str(cmpfail_label) "]\n\t" \
+ ".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, int cpu)
+{
+ RSEQ_INJECT_C(9)
+
+ __asm__ __volatile__ goto (
+ RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
+ /* Start rseq by storing table entry pointer into rseq_cs. */
+ RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+ RSEQ_INJECT_ASM(3)
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz %l[cmpfail]\n\t"
+ RSEQ_INJECT_ASM(4)
+#ifdef RSEQ_COMPARE_TWICE
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1])
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz %l[error2]\n\t"
+#endif
+ /* final store */
+ LONG_S " %[newv], %[v]\n\t"
+ "2:\n\t"
+ RSEQ_INJECT_ASM(5)
+ RSEQ_ASM_DEFINE_ABORT(4, "", abort)
+ : /* gcc asm goto does not allow outputs */
+ : [cpu_id] "r" (cpu),
+ [current_cpu_id] "m" (__rseq_abi.cpu_id),
+ [rseq_cs] "m" (__rseq_abi.rseq_cs),
+ [v] "m" (*v),
+ [expect] "r" (expect),
+ [newv] "r" (newv)
+ RSEQ_INJECT_INPUT
+ : "memory", "cc", "r0"
+ RSEQ_INJECT_CLOBBER
+ : abort, cmpfail
+#ifdef RSEQ_COMPARE_TWICE
+ , error1, error2
+#endif
+ );
+ return 0;
+abort:
+ RSEQ_INJECT_FAILED
+ return -1;
+cmpfail:
+ return 1;
+#ifdef RSEQ_COMPARE_TWICE
+error1:
+ rseq_bug("cpu_id comparison failed");
+error2:
+ rseq_bug("expected value comparison failed");
+#endif
+}
+
+/*
+ * Compare @v against @expectnot. When it does _not_ match, load @v
+ * into @load, and store the content of *@v + voffp into @v.
+ */
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+ off_t voffp, intptr_t *load, int cpu)
+{
+ RSEQ_INJECT_C(9)
+
+ __asm__ __volatile__ goto (
+ RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
+ /* Start rseq by storing table entry pointer into rseq_cs. */
+ RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+ RSEQ_INJECT_ASM(3)
+ LONG_L " %%r1, %[v]\n\t"
+ LONG_CMP_R " %%r1, %[expectnot]\n\t"
+ "je %l[cmpfail]\n\t"
+ RSEQ_INJECT_ASM(4)
+#ifdef RSEQ_COMPARE_TWICE
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1])
+ LONG_L " %%r1, %[v]\n\t"
+ LONG_CMP_R " %%r1, %[expectnot]\n\t"
+ "je %l[error2]\n\t"
+#endif
+ LONG_S " %%r1, %[load]\n\t"
+ LONG_ADD_R " %%r1, %[voffp]\n\t"
+ LONG_L " %%r1, 0(%%r1)\n\t"
+ /* final store */
+ LONG_S " %%r1, %[v]\n\t"
+ "2:\n\t"
+ RSEQ_INJECT_ASM(5)
+ RSEQ_ASM_DEFINE_ABORT(4, "", abort)
+ : /* gcc asm goto does not allow outputs */
+ : [cpu_id] "r" (cpu),
+ [current_cpu_id] "m" (__rseq_abi.cpu_id),
+ [rseq_cs] "m" (__rseq_abi.rseq_cs),
+ /* final store input */
+ [v] "m" (*v),
+ [expectnot] "r" (expectnot),
+ [voffp] "r" (voffp),
+ [load] "m" (*load)
+ RSEQ_INJECT_INPUT
+ : "memory", "cc", "r0", "r1"
+ RSEQ_INJECT_CLOBBER
+ : abort, cmpfail
+#ifdef RSEQ_COMPARE_TWICE
+ , error1, error2
+#endif
+ );
+ return 0;
+abort:
+ RSEQ_INJECT_FAILED
+ return -1;
+cmpfail:
+ return 1;
+#ifdef RSEQ_COMPARE_TWICE
+error1:
+ rseq_bug("cpu_id comparison failed");
+error2:
+ rseq_bug("expected value comparison failed");
+#endif
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+ RSEQ_INJECT_C(9)
+
+ __asm__ __volatile__ goto (
+ RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
+ /* Start rseq by storing table entry pointer into rseq_cs. */
+ RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+ RSEQ_INJECT_ASM(3)
+#ifdef RSEQ_COMPARE_TWICE
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1])
+#endif
+ LONG_L " %%r0, %[v]\n\t"
+ LONG_ADD_R " %%r0, %[count]\n\t"
+ /* final store */
+ LONG_S " %%r0, %[v]\n\t"
+ "2:\n\t"
+ RSEQ_INJECT_ASM(4)
+ RSEQ_ASM_DEFINE_ABORT(4, "", abort)
+ : /* gcc asm goto does not allow outputs */
+ : [cpu_id] "r" (cpu),
+ [current_cpu_id] "m" (__rseq_abi.cpu_id),
+ [rseq_cs] "m" (__rseq_abi.rseq_cs),
+ /* final store input */
+ [v] "m" (*v),
+ [count] "r" (count)
+ RSEQ_INJECT_INPUT
+ : "memory", "cc", "r0"
+ RSEQ_INJECT_CLOBBER
+ : abort
+#ifdef RSEQ_COMPARE_TWICE
+ , error1
+#endif
+ );
+ return 0;
+abort:
+ RSEQ_INJECT_FAILED
+ return -1;
+#ifdef RSEQ_COMPARE_TWICE
+error1:
+ rseq_bug("cpu_id comparison failed");
+#endif
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+ intptr_t *v2, intptr_t newv2,
+ intptr_t newv, int cpu)
+{
+ RSEQ_INJECT_C(9)
+
+ __asm__ __volatile__ goto (
+ RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
+ /* Start rseq by storing table entry pointer into rseq_cs. */
+ RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+ RSEQ_INJECT_ASM(3)
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz %l[cmpfail]\n\t"
+ RSEQ_INJECT_ASM(4)
+#ifdef RSEQ_COMPARE_TWICE
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1])
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz %l[error2]\n\t"
+#endif
+ /* try store */
+ LONG_S " %[newv2], %[v2]\n\t"
+ RSEQ_INJECT_ASM(5)
+ /* final store */
+ LONG_S " %[newv], %[v]\n\t"
+ "2:\n\t"
+ RSEQ_INJECT_ASM(6)
+ RSEQ_ASM_DEFINE_ABORT(4, "", abort)
+ : /* gcc asm goto does not allow outputs */
+ : [cpu_id] "r" (cpu),
+ [current_cpu_id] "m" (__rseq_abi.cpu_id),
+ [rseq_cs] "m" (__rseq_abi.rseq_cs),
+ /* try store input */
+ [v2] "m" (*v2),
+ [newv2] "r" (newv2),
+ /* final store input */
+ [v] "m" (*v),
+ [expect] "r" (expect),
+ [newv] "r" (newv)
+ RSEQ_INJECT_INPUT
+ : "memory", "cc", "r0"
+ RSEQ_INJECT_CLOBBER
+ : abort, cmpfail
+#ifdef RSEQ_COMPARE_TWICE
+ , error1, error2
+#endif
+ );
+ return 0;
+abort:
+ RSEQ_INJECT_FAILED
+ return -1;
+cmpfail:
+ return 1;
+#ifdef RSEQ_COMPARE_TWICE
+error1:
+ rseq_bug("cpu_id comparison failed");
+error2:
+ rseq_bug("expected value comparison failed");
+#endif
+}
+
+/* s390 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+ intptr_t *v2, intptr_t newv2,
+ intptr_t newv, int cpu)
+{
+ return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2, newv, cpu);
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+ intptr_t *v2, intptr_t expect2,
+ intptr_t newv, int cpu)
+{
+ RSEQ_INJECT_C(9)
+
+ __asm__ __volatile__ goto (
+ RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
+ /* Start rseq by storing table entry pointer into rseq_cs. */
+ RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+ RSEQ_INJECT_ASM(3)
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz %l[cmpfail]\n\t"
+ RSEQ_INJECT_ASM(4)
+ LONG_CMP " %[expect2], %[v2]\n\t"
+ "jnz %l[cmpfail]\n\t"
+ RSEQ_INJECT_ASM(5)
+#ifdef RSEQ_COMPARE_TWICE
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, %l[error1])
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz %l[error2]\n\t"
+ LONG_CMP " %[expect2], %[v2]\n\t"
+ "jnz %l[error3]\n\t"
+#endif
+ /* final store */
+ LONG_S " %[newv], %[v]\n\t"
+ "2:\n\t"
+ RSEQ_INJECT_ASM(6)
+ RSEQ_ASM_DEFINE_ABORT(4, "", abort)
+ : /* gcc asm goto does not allow outputs */
+ : [cpu_id] "r" (cpu),
+ [current_cpu_id] "m" (__rseq_abi.cpu_id),
+ [rseq_cs] "m" (__rseq_abi.rseq_cs),
+ /* cmp2 input */
+ [v2] "m" (*v2),
+ [expect2] "r" (expect2),
+ /* final store input */
+ [v] "m" (*v),
+ [expect] "r" (expect),
+ [newv] "r" (newv)
+ RSEQ_INJECT_INPUT
+ : "memory", "cc", "r0"
+ RSEQ_INJECT_CLOBBER
+ : abort, cmpfail
+#ifdef RSEQ_COMPARE_TWICE
+ , error1, error2, error3
+#endif
+ );
+ return 0;
+abort:
+ RSEQ_INJECT_FAILED
+ return -1;
+cmpfail:
+ return 1;
+#ifdef RSEQ_COMPARE_TWICE
+error1:
+ rseq_bug("cpu_id comparison failed");
+error2:
+ rseq_bug("1st expected value comparison failed");
+error3:
+ rseq_bug("2nd expected value comparison failed");
+#endif
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+ void *dst, void *src, size_t len,
+ intptr_t newv, int cpu)
+{
+ uint64_t rseq_scratch[3];
+
+ RSEQ_INJECT_C(9)
+
+ __asm__ __volatile__ goto (
+ RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
+ LONG_S " %[src], %[rseq_scratch0]\n\t"
+ LONG_S " %[dst], %[rseq_scratch1]\n\t"
+ LONG_S " %[len], %[rseq_scratch2]\n\t"
+ /* Start rseq by storing table entry pointer into rseq_cs. */
+ RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+ RSEQ_INJECT_ASM(3)
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz 5f\n\t"
+ RSEQ_INJECT_ASM(4)
+#ifdef RSEQ_COMPARE_TWICE
+ RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 6f)
+ LONG_CMP " %[expect], %[v]\n\t"
+ "jnz 7f\n\t"
+#endif
+ /* try memcpy */
+ LONG_LT_R " %[len], %[len]\n\t"
+ "jz 333f\n\t"
+ "222:\n\t"
+ "ic %%r0,0(%[src])\n\t"
+ "stc %%r0,0(%[dst])\n\t"
+ LONG_ADDI " %[src], 1\n\t"
+ LONG_ADDI " %[dst], 1\n\t"
+ LONG_ADDI " %[len], -1\n\t"
+ "jnz 222b\n\t"
+ "333:\n\t"
+ RSEQ_INJECT_ASM(5)
+ /* final store */
+ LONG_S " %[newv], %[v]\n\t"
+ "2:\n\t"
+ RSEQ_INJECT_ASM(6)
+ /* teardown */
+ LONG_L " %[len], %[rseq_scratch2]\n\t"
+ LONG_L " %[dst], %[rseq_scratch1]\n\t"
+ LONG_L " %[src], %[rseq_scratch0]\n\t"
+ RSEQ_ASM_DEFINE_ABORT(4,
+ LONG_L " %[len], %[rseq_scratch2]\n\t"
+ LONG_L " %[dst], %[rseq_scratch1]\n\t"
+ LONG_L " %[src], %[rseq_scratch0]\n\t",
+ abort)
+ RSEQ_ASM_DEFINE_CMPFAIL(5,
+ LONG_L " %[len], %[rseq_scratch2]\n\t"
+ LONG_L " %[dst], %[rseq_scratch1]\n\t"
+ LONG_L " %[src], %[rseq_scratch0]\n\t",
+ cmpfail)
+#ifdef RSEQ_COMPARE_TWICE
+ RSEQ_ASM_DEFINE_CMPFAIL(6,
+ LONG_L " %[len], %[rseq_scratch2]\n\t"
+ LONG_L " %[dst], %[rseq_scratch1]\n\t"
+ LONG_L " %[src], %[rseq_scratch0]\n\t",
+ error1)
+ RSEQ_ASM_DEFINE_CMPFAIL(7,
+ LONG_L " %[len], %[rseq_scratch2]\n\t"
+ LONG_L " %[dst], %[rseq_scratch1]\n\t"
+ LONG_L " %[src], %[rseq_scratch0]\n\t",
+ error2)
+#endif
+ : /* gcc asm goto does not allow outputs */
+ : [cpu_id] "r" (cpu),
+ [current_cpu_id] "m" (__rseq_abi.cpu_id),
+ [rseq_cs] "m" (__rseq_abi.rseq_cs),
+ /* final store input */
+ [v] "m" (*v),
+ [expect] "r" (expect),
+ [newv] "r" (newv),
+ /* try memcpy input */
+ [dst] "r" (dst),
+ [src] "r" (src),
+ [len] "r" (len),
+ [rseq_scratch0] "m" (rseq_scratch[0]),
+ [rseq_scratch1] "m" (rseq_scratch[1]),
+ [rseq_scratch2] "m" (rseq_scratch[2])
+ RSEQ_INJECT_INPUT
+ : "memory", "cc", "r0"
+ RSEQ_INJECT_CLOBBER
+ : abort, cmpfail
+#ifdef RSEQ_COMPARE_TWICE
+ , error1, error2
+#endif
+ );
+ return 0;
+abort:
+ RSEQ_INJECT_FAILED
+ return -1;
+cmpfail:
+ return 1;
+#ifdef RSEQ_COMPARE_TWICE
+error1:
+ rseq_bug("cpu_id comparison failed");
+error2:
+ rseq_bug("expected value comparison failed");
+#endif
+}
+
+/* s390 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+ void *dst, void *src, size_t len,
+ intptr_t newv, int cpu)
+{
+ return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src, len,
+ newv, cpu);
+}
+#endif /* !RSEQ_SKIP_FASTPATH */
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index 86ce224..ca332ef 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -75,6 +75,8 @@
#include <rseq-ppc.h>
#elif defined(__mips__)
#include <rseq-mips.h>
+#elif defined(__s390__)
+#include <rseq-s390.h>
#else
#error unsupported target
#endif
diff --git a/tools/testing/selftests/timers/raw_skew.c b/tools/testing/selftests/timers/raw_skew.c
index ca6cd14..dcf73c5 100644
--- a/tools/testing/selftests/timers/raw_skew.c
+++ b/tools/testing/selftests/timers/raw_skew.c
@@ -134,6 +134,11 @@
printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000)));
if (llabs(eppm - ppm) > 1000) {
+ if (tx1.offset || tx2.offset ||
+ tx1.freq != tx2.freq || tx1.tick != tx2.tick) {
+ printf(" [SKIP]\n");
+ return ksft_exit_skip("The clock was adjusted externally. Shutdown NTPd or other time sync daemons\n");
+ }
printf(" [FAILED]\n");
return ksft_exit_fail();
}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 04e554c..108250e 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -604,7 +604,7 @@
kvm_for_each_vcpu(i, vcpu, kvm) {
vcpu->arch.pause = false;
- swake_up(kvm_arch_vcpu_wq(vcpu));
+ swake_up_one(kvm_arch_vcpu_wq(vcpu));
}
}
@@ -612,7 +612,7 @@
{
struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
- swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
+ swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) &&
(!vcpu->arch.pause)));
if (vcpu->arch.power_off || vcpu->arch.pause) {
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index c95ab4c..9b73d3a 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -155,7 +155,7 @@
smp_mb(); /* Make sure the above is visible */
wq = kvm_arch_vcpu_wq(vcpu);
- swake_up(wq);
+ swake_up_one(wq);
return PSCI_RET_SUCCESS;
}
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 57bcb27..23c2519 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -107,7 +107,7 @@
trace_kvm_async_pf_completed(addr, gva);
if (swq_has_sleeper(&vcpu->wq))
- swake_up(&vcpu->wq);
+ swake_up_one(&vcpu->wq);
mmput(mm);
kvm_put_kvm(vcpu->kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b47507f..3d233eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2172,7 +2172,7 @@
kvm_arch_vcpu_blocking(vcpu);
for (;;) {
- prepare_to_swait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
if (kvm_vcpu_check_block(vcpu) < 0)
break;
@@ -2214,7 +2214,7 @@
wqp = kvm_arch_vcpu_wq(vcpu);
if (swq_has_sleeper(wqp)) {
- swake_up(wqp);
+ swake_up_one(wqp);
++vcpu->stat.halt_wakeup;
return true;
}