Huang Ying | ea8c071 | 2010-05-18 14:35:15 +0800 | [diff] [blame] | 1 | APEI Error INJection |
| 2 | ~~~~~~~~~~~~~~~~~~~~ |
| 3 | |
| 4 | EINJ provides a hardware error injection mechanism |
| 5 | It is very useful for debugging and testing of other APEI and RAS features. |
| 6 | |
| 7 | To use EINJ, make sure the following are enabled in your kernel |
| 8 | configuration: |
| 9 | |
| 10 | CONFIG_DEBUG_FS |
| 11 | CONFIG_ACPI_APEI |
| 12 | CONFIG_ACPI_APEI_EINJ |
| 13 | |
| 14 | The user interface of EINJ is debug file system, under the |
| 15 | directory apei/einj. The following files are provided. |
| 16 | |
| 17 | - available_error_type |
| 18 | Reading this file returns the error injection capability of the |
| 19 | platform, that is, which error types are supported. The error type |
| 20 | definition is as follow, the left field is the error type value, the |
| 21 | right field is error description. |
| 22 | |
| 23 | 0x00000001 Processor Correctable |
| 24 | 0x00000002 Processor Uncorrectable non-fatal |
| 25 | 0x00000004 Processor Uncorrectable fatal |
| 26 | 0x00000008 Memory Correctable |
| 27 | 0x00000010 Memory Uncorrectable non-fatal |
| 28 | 0x00000020 Memory Uncorrectable fatal |
| 29 | 0x00000040 PCI Express Correctable |
| 30 | 0x00000080 PCI Express Uncorrectable fatal |
| 31 | 0x00000100 PCI Express Uncorrectable non-fatal |
| 32 | 0x00000200 Platform Correctable |
| 33 | 0x00000400 Platform Uncorrectable non-fatal |
| 34 | 0x00000800 Platform Uncorrectable fatal |
| 35 | |
| 36 | The format of file contents are as above, except there are only the |
| 37 | available error type lines. |
| 38 | |
| 39 | - error_type |
| 40 | This file is used to set the error type value. The error type value |
| 41 | is defined in "available_error_type" description. |
| 42 | |
| 43 | - error_inject |
| 44 | Write any integer to this file to trigger the error |
| 45 | injection. Before this, please specify all necessary error |
| 46 | parameters. |
| 47 | |
Huang Ying | 6e320ec | 2010-05-18 14:35:24 +0800 | [diff] [blame] | 48 | - param1 |
| 49 | This file is used to set the first error parameter value. Effect of |
Tony Luck | c130bd6 | 2012-01-17 12:10:16 -0800 | [diff] [blame] | 50 | parameter depends on error_type specified. |
Huang Ying | 6e320ec | 2010-05-18 14:35:24 +0800 | [diff] [blame] | 51 | |
| 52 | - param2 |
| 53 | This file is used to set the second error parameter value. Effect of |
Tony Luck | c130bd6 | 2012-01-17 12:10:16 -0800 | [diff] [blame] | 54 | parameter depends on error_type specified. |
Huang Ying | c3e6088 | 2011-07-20 16:09:29 +0800 | [diff] [blame] | 55 | |
Chen Gong | 6ef19ab | 2012-03-15 16:53:37 +0800 | [diff] [blame] | 56 | - notrigger |
| 57 | The EINJ mechanism is a two step process. First inject the error, then |
| 58 | perform some actions to trigger it. Setting "notrigger" to 1 skips the |
| 59 | trigger phase, which *may* allow the user to cause the error in some other |
| 60 | context by a simple access to the cpu, memory location, or device that is |
| 61 | the target of the error injection. Whether this actually works depends |
| 62 | on what operations the BIOS actually includes in the trigger phase. |
| 63 | |
Tony Luck | c130bd6 | 2012-01-17 12:10:16 -0800 | [diff] [blame] | 64 | BIOS versions based in the ACPI 4.0 specification have limited options |
| 65 | to control where the errors are injected. Your BIOS may support an |
| 66 | extension (enabled with the param_extension=1 module parameter, or |
| 67 | boot command line einj.param_extension=1). This allows the address |
| 68 | and mask for memory injections to be specified by the param1 and |
| 69 | param2 files in apei/einj. |
| 70 | |
| 71 | BIOS versions using the ACPI 5.0 specification have more control over |
| 72 | the target of the injection. For processor related errors (type 0x1, |
| 73 | 0x2 and 0x4) the APICID of the target should be provided using the |
| 74 | param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20) |
| 75 | the address is set using param1 with a mask in param2 (0x0 is equivalent |
| 76 | to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the |
| 77 | segment, bus, device and function are specified using param1: |
| 78 | |
| 79 | 31 24 23 16 15 11 10 8 7 0 |
| 80 | +-------------------------------------------------+ |
| 81 | | segment | bus | device | function | reserved | |
| 82 | +-------------------------------------------------+ |
| 83 | |
| 84 | An ACPI 5.0 BIOS may also allow vendor specific errors to be injected. |
| 85 | In this case a file named vendor will contain identifying information |
| 86 | from the BIOS that hopefully will allow an application wishing to use |
| 87 | the vendor specific extension to tell that they are running on a BIOS |
| 88 | that supports it. All vendor extensions have the 0x80000000 bit set in |
| 89 | error_type. A file vendor_flags controls the interpretation of param1 |
| 90 | and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor |
| 91 | documentation for details (and expect changes to this API if vendors |
| 92 | creativity in using this feature expands beyond our expectations). |
| 93 | |
| 94 | Example: |
| 95 | # cd /sys/kernel/debug/apei/einj |
| 96 | # cat available_error_type # See which errors can be injected |
| 97 | 0x00000002 Processor Uncorrectable non-fatal |
| 98 | 0x00000008 Memory Correctable |
| 99 | 0x00000010 Memory Uncorrectable non-fatal |
| 100 | # echo 0x12345000 > param1 # Set memory address for injection |
| 101 | # echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page |
| 102 | # echo 0x8 > error_type # Choose correctable memory error |
| 103 | # echo 1 > error_inject # Inject now |
| 104 | |
Huang Ying | 6e320ec | 2010-05-18 14:35:24 +0800 | [diff] [blame] | 105 | |
Huang Ying | ea8c071 | 2010-05-18 14:35:15 +0800 | [diff] [blame] | 106 | For more information about EINJ, please refer to ACPI specification |
Tony Luck | c130bd6 | 2012-01-17 12:10:16 -0800 | [diff] [blame] | 107 | version 4.0, section 17.5 and ACPI 5.0, section 18.6. |