tharun kumar | ae86053 | 2017-06-28 16:56:10 +0530 | [diff] [blame] | 1 | Introduction |
| 2 | ============ |
| 3 | |
| 4 | The goal of this debug feature is to provide a reliable, responsive, |
| 5 | accurate and secure debug capability to developers interested in |
| 6 | debugging MSM subsystem processor images without the use of a hardware |
| 7 | debugger. |
| 8 | |
| 9 | The Debug Agent along with the Remote Debug Driver implements a shared |
| 10 | memory based transport mechanism that allows for a debugger (ex. GDB) |
| 11 | running on a host PC to communicate with a remote stub running on |
| 12 | peripheral subsystems such as the ADSP, MODEM etc. |
| 13 | |
| 14 | The diagram below depicts end to end the components involved to |
| 15 | support remote debugging: |
| 16 | |
| 17 | |
| 18 | : : |
| 19 | : HOST (PC) : MSM |
| 20 | : ,--------, : ,-------, |
| 21 | : | | : | Debug | ,--------, |
| 22 | : |Debugger|<--:-->| Agent | | Remote | |
| 23 | : | | : | App | +----->| Debug | |
| 24 | : `--------` : |-------| ,--------, | | Stub | |
| 25 | : : | Remote| | |<---+ `--------` |
| 26 | : : | Debug |<-->|--------| |
| 27 | : : | Driver| | |<---+ ,--------, |
| 28 | : : `-------` `--------` | | Remote | |
| 29 | : : LA Shared +----->| Debug | |
| 30 | : : Memory | Stub | |
| 31 | : : `--------` |
| 32 | : : Peripheral Subsystems |
| 33 | : : (ADSP, MODEM, ...) |
| 34 | |
| 35 | |
| 36 | Debugger: Debugger application running on the host PC that |
| 37 | communicates with the remote stub. |
| 38 | Examples: GDB, LLDB |
| 39 | |
| 40 | Debug Agent: Software that runs on the Linux Android platform |
| 41 | that provides connectivity from the MSM to the |
| 42 | host PC. This involves two portions: |
| 43 | 1) User mode Debug Agent application that discovers |
| 44 | processes running on the subsystems and creates |
| 45 | TCP/IP sockets for the host to connect to. In addition |
| 46 | to this, it creates an info (or meta) port that |
| 47 | users can connect to discover the various |
| 48 | processes and their corresponding debug ports. |
| 49 | |
| 50 | Remote Debug A character based driver that the Debug |
| 51 | Driver: Agent uses to transport the payload received from the |
| 52 | host to the debug stub running on the subsystem |
| 53 | processor over shared memory and vice versa. |
| 54 | |
| 55 | Shared Memory: Shared memory from the SMEM pool that is accessible |
| 56 | from the Applications Processor (AP) and the |
| 57 | subsystem processors. |
| 58 | |
| 59 | Remote Debug Privileged code that runs in the kernels of the |
| 60 | Stub: subsystem processors that receives debug commands |
| 61 | from the debugger running on the host and |
| 62 | acts on these commands. These commands include reading |
| 63 | and writing to registers and memory belonging to the |
| 64 | subsystem's address space, setting breakpoints, |
| 65 | single stepping etc. |
| 66 | |
| 67 | Hardware description |
| 68 | ==================== |
| 69 | |
| 70 | The Remote Debug Driver interfaces with the Remote Debug stubs |
| 71 | running on the subsystem processors and does not drive or |
| 72 | manage any hardware resources. |
| 73 | |
| 74 | Software description |
| 75 | ==================== |
| 76 | |
| 77 | The debugger and the remote stubs use Remote Serial Protocol (RSP) |
| 78 | to communicate with each other. This is widely used protocol by both |
| 79 | software and hardware debuggers. RSP is an ASCII based protocol |
| 80 | and used when it is not possible to run GDB server on the target under |
| 81 | debug. |
| 82 | |
| 83 | The Debug Agent application along with the Remote Debug Driver |
| 84 | is responsible for establishing a bi-directional connection from |
| 85 | the debugger application running on the host to the remote debug |
| 86 | stub running on a subsystem. The Debug Agent establishes connectivity |
| 87 | to the host PC via TCP/IP sockets. |
| 88 | |
| 89 | This feature uses ADB port forwarding to establish connectivity |
| 90 | between the debugger running on the host and the target under debug. |
| 91 | |
| 92 | Please note the Debug Agent does not expose HLOS memory to the |
| 93 | remote subsystem processors. |
| 94 | |
| 95 | Design |
| 96 | ====== |
| 97 | |
| 98 | Here is the overall flow: |
| 99 | |
| 100 | 1) When the Debug Agent application starts up, it opens up a shared memory |
| 101 | based transport channel to the various subsystem processor images. |
| 102 | |
| 103 | 2) The Debug Agent application sends messages across to the remote stubs |
| 104 | to discover the various processes that are running on the subsystem and |
| 105 | creates debug sockets for each of them. |
| 106 | |
| 107 | 3) Whenever a process running on a subsystem exits, the Debug Agent |
| 108 | is notified by the stub so that the debug port and other resources |
| 109 | can be reclaimed. |
| 110 | |
| 111 | 4) The Debug Agent uses the services of the Remote Debug Driver to |
| 112 | transport payload from the host debugger to the remote stub and vice versa. |
| 113 | |
| 114 | 5) Communication between the Remote Debug Driver and the Remote Debug stub |
| 115 | running on the subsystem processor is done over shared memory (see figure). |
| 116 | SMEM services are used to allocate the shared memory that will |
| 117 | be readable and writeable by the AP and the subsystem image under debug. |
| 118 | |
| 119 | A separate SMEM allocation takes place for each subsystem processor |
| 120 | involved in remote debugging. The remote stub running on each of the |
| 121 | subsystems allocates a SMEM buffer using a unique identifier so that both |
| 122 | the AP and subsystem get the same physical block of memory. It should be |
| 123 | noted that subsystem images can be restarted at any time. |
| 124 | However, when a subsystem comes back up, its stub uses the same unique |
| 125 | SMEM identifier to allocate the SMEM block. This would not result in a |
| 126 | new allocation rather the same block of memory in the first bootup instance |
| 127 | is provided back to the stub running on the subsystem. |
| 128 | |
| 129 | An 8KB chunk of shared memory is allocated and used for communication |
| 130 | per subsystem. For multi-process capable subsystems, 16KB chunk of shared |
| 131 | memory is allocated to allow for simultaneous debugging of more than one |
| 132 | process running on a single subsystem. |
| 133 | |
| 134 | The shared memory is used as a circular ring buffer in each direction. |
| 135 | Thus we have a bi-directional shared memory channel between the AP |
| 136 | and a subsystem. We call this SMQ. Each memory channel contains a header, |
| 137 | data and a control mechanism that is used to synchronize read and write |
| 138 | of data between the AP and the remote subsystem. |
| 139 | |
| 140 | Overall SMQ memory view: |
| 141 | : |
| 142 | : +------------------------------------------------+ |
| 143 | : | SMEM buffer | |
| 144 | : |-----------------------+------------------------| |
| 145 | : |Producer: LA | Producer: Remote | |
| 146 | : |Consumer: Remote | subsystem | |
| 147 | : | subsystem | Consumer: LA | |
| 148 | : | | | |
| 149 | : | Producer| Consumer| |
| 150 | : +-----------------------+------------------------+ |
| 151 | : | | |
| 152 | : | | |
| 153 | : | +--------------------------------------+ |
| 154 | : | | |
| 155 | : | | |
| 156 | : v v |
| 157 | : +--------------------------------------------------------------+ |
| 158 | : | Header | Data | Control | |
| 159 | : +-----------+---+---+---+-----+----+--+--+-----+---+--+--+-----+ |
| 160 | : | | b | b | b | | S |n |n | | S |n |n | | |
| 161 | : | Producer | l | l | l | | M |o |o | | M |o |o | | |
| 162 | : | Ver | o | o | o | | Q |d |d | | Q |d |d | | |
| 163 | : |-----------| c | c | c | ... | |e |e | ... | |e |e | ... | |
| 164 | : | | k | k | k | | O | | | | I | | | | |
| 165 | : | Consumer | | | | | u |0 |1 | | n |0 |1 | | |
| 166 | : | Ver | 0 | 1 | 2 | | t | | | | | | | | |
| 167 | : +-----------+---+---+---+-----+----+--+--+-----+---+--+--+-----+ |
| 168 | : | | |
| 169 | : + | |
| 170 | : | |
| 171 | : +------------------------+ |
| 172 | : | |
| 173 | : v |
| 174 | : +----+----+----+----+ |
| 175 | : | SMQ Nodes | |
| 176 | : |----|----|----|----| |
| 177 | : Node # | 0 | 1 | 2 | ...| |
| 178 | : |----|----|----|----| |
| 179 | : Starting Block Index # | 0 | 3 | 8 | ...| |
| 180 | : |----|----|----|----| |
| 181 | : # of blocks | 3 | 5 | 1 | ...| |
| 182 | : +----+----+----+----+ |
| 183 | : |
| 184 | |
| 185 | Header: Contains version numbers for software compatibility to ensure |
| 186 | that both producers and consumers on the AP and subsystems know how to |
| 187 | read from and write to the queue. |
| 188 | Both the producer and consumer versions are 1. |
| 189 | : +---------+-------------------+ |
| 190 | : | Size | Field | |
| 191 | : +---------+-------------------+ |
| 192 | : | 1 byte | Producer Version | |
| 193 | : +---------+-------------------+ |
| 194 | : | 1 byte | Consumer Version | |
| 195 | : +---------+-------------------+ |
| 196 | |
| 197 | |
| 198 | Data: The data portion contains multiple blocks [0..N] of a fixed size. |
| 199 | The block size SM_BLOCKSIZE is fixed to 128 bytes for header version #1. |
| 200 | Payload sent from the debug agent app is split (if necessary) and placed |
| 201 | in these blocks. The first data block is placed at the next 8 byte aligned |
| 202 | address after the header. |
| 203 | |
| 204 | The number of blocks for a given SMEM allocation is derived as follows: |
| 205 | Number of Blocks = ((Total Size - Alignment - Size of Header |
| 206 | - Size of SMQIn - Size of SMQOut)/(SM_BLOCKSIZE)) |
| 207 | |
| 208 | The producer maintains a private block map of each of these blocks to |
| 209 | determine which of these blocks in the queue is available and which are free. |
| 210 | |
| 211 | Control: |
| 212 | The control portion contains a list of nodes [0..N] where N is number |
| 213 | of available data blocks. Each node identifies the data |
| 214 | block indexes that contain a particular debug message to be transferred, |
| 215 | and the number of blocks it took to hold the contents of the message. |
| 216 | |
| 217 | Each node has the following structure: |
| 218 | : +---------+-------------------+ |
| 219 | : | Size | Field | |
| 220 | : +---------+-------------------+ |
| 221 | : | 2 bytes |Staring Block Index| |
| 222 | : +---------+-------------------+ |
| 223 | : | 2 bytes |Number of Blocks | |
| 224 | : +---------+-------------------+ |
| 225 | |
| 226 | The producer and the consumer update different parts of the control channel |
| 227 | (SMQOut / SMQIn) respectively. Each of these control data structures contains |
| 228 | information about the last node that was written / read, and the actual nodes |
| 229 | that were written/read. |
| 230 | |
| 231 | SMQOut Structure (R/W by producer, R by consumer): |
| 232 | : +---------+-------------------+ |
| 233 | : | Size | Field | |
| 234 | : +---------+-------------------+ |
| 235 | : | 4 bytes | Magic Init Number | |
| 236 | : +---------+-------------------+ |
| 237 | : | 4 bytes | Reset | |
| 238 | : +---------+-------------------+ |
| 239 | : | 4 bytes | Last Sent Index | |
| 240 | : +---------+-------------------+ |
| 241 | : | 4 bytes | Index Free Read | |
| 242 | : +---------+-------------------+ |
| 243 | |
| 244 | SMQIn Structure (R/W by consumer, R by producer): |
| 245 | : +---------+-------------------+ |
| 246 | : | Size | Field | |
| 247 | : +---------+-------------------+ |
| 248 | : | 4 bytes | Magic Init Number | |
| 249 | : +---------+-------------------+ |
| 250 | : | 4 bytes | Reset ACK | |
| 251 | : +---------+-------------------+ |
| 252 | : | 4 bytes | Last Read Index | |
| 253 | : +---------+-------------------+ |
| 254 | : | 4 bytes | Index Free Write | |
| 255 | : +---------+-------------------+ |
| 256 | |
| 257 | Magic Init Number: |
| 258 | Both SMQ Out and SMQ In initialize this field with a predefined magic |
| 259 | number so as to make sure that both the consumer and producer blocks |
| 260 | have fully initialized and have valid data in the shared memory control area. |
| 261 | Producer Magic #: 0xFF00FF01 |
| 262 | Consumer Magic #: 0xFF00FF02 |
| 263 | |
| 264 | SMQ Out's Last Sent Index and Index Free Read: |
| 265 | Only a producer can write to these indexes and they are updated whenever |
| 266 | there is new payload to be inserted into the SMQ in order to be sent to a |
| 267 | consumer. |
| 268 | |
| 269 | The number of blocks required for the SMQ allocation is determined as: |
| 270 | (payload size + SM_BLOCKSIZE - 1) / SM_BLOCKSIZE |
| 271 | |
| 272 | The private block map is searched for a large enough continuous set of blocks |
| 273 | and the user data is copied into the data blocks. |
| 274 | |
| 275 | The starting index of the free block(s) is updated in the SMQOut's Last Sent |
| 276 | Index. This update keeps track of which index was last written to and the |
| 277 | producer uses it to determine where the the next allocation could be done. |
| 278 | |
| 279 | Every allocation, a producer updates the Index Free Read from its |
| 280 | collaborating consumer's Index Free Write field (if they are unequal). |
| 281 | This index value indicates that the consumer has read all blocks associated |
| 282 | with allocation on the SMQ and that the producer can reuse these blocks for |
| 283 | subsquent allocations since this is a circular queue. |
| 284 | |
| 285 | At cold boot and restart, these indexes are initialized to zero and all |
| 286 | blocks are marked as available for allocation. |
| 287 | |
| 288 | SMQ In's Last Read Index and Index Free Write: |
| 289 | These indexes are written to only by a consumer and are updated whenever |
| 290 | there is new payload to be read from the SMQ. The Last Read Index keeps |
| 291 | track of which index was last read by the consumer and using this, it |
| 292 | determines where the next read should be done. |
| 293 | After completing a read, Last Read Index is incremented to the |
| 294 | next block index. A consumer updates Index Free Write to the starting |
| 295 | index of an allocation whenever it has completed processing the blocks. |
| 296 | This is an optimization that can be used to prevent an additional copy |
| 297 | of data from the queue into a client's data buffer and the data in the queue |
| 298 | itself can be used. |
| 299 | Once Index Free Write is updated, the collaborating producer (on the next |
| 300 | data allocation) reads the updated Index Free Write value and it then |
| 301 | updates its corresponding SMQ Out's Index Free Read and marks the blocks |
| 302 | associated with that index as available for allocation. At cold boot and |
| 303 | restart, these indexes are initialized to zero. |
| 304 | |
| 305 | SMQ Out Reset# and SMQ In Reset ACK #: |
| 306 | Since subsystems can restart at anytime, the data blocks and control channel |
| 307 | can be in an inconsistent state when a producer or consumer comes up. |
| 308 | We use Reset and Reset ACK to manage this. At cold boot, the producer |
| 309 | initializes the Reset# to a known number ex. 1. Every other reset that the |
| 310 | producer undergoes, the Reset#1 is simply incremented by 1. All the producer |
| 311 | indexes are reset. |
| 312 | When the producer notifies the consumer of data availability, the consumer |
| 313 | reads the producers Reset # and copies that into its SMQ In Reset ACK# |
| 314 | field when they differ. When that occurs, the consumer resets its |
| 315 | indexes to 0. |
| 316 | |
| 317 | 6) Asynchronous notifications between a producer and consumer are |
| 318 | done using the SMP2P service which is interrupt based. |
| 319 | |
| 320 | Power Management |
| 321 | ================ |
| 322 | |
| 323 | None |
| 324 | |
| 325 | SMP/multi-core |
| 326 | ============== |
| 327 | |
| 328 | The driver uses completion to wake up the Debug Agent client threads. |
| 329 | |
| 330 | Security |
| 331 | ======== |
| 332 | |
| 333 | From the perspective of the subsystem, the AP is untrusted. The remote |
| 334 | stubs consult the secure debug fuses to determine whether or not the |
| 335 | remote debugging will be enabled at the subsystem. |
| 336 | |
| 337 | If the hardware debug fuses indicate that debugging is disabled, the |
| 338 | remote stubs will not be functional on the subsystem. Writes to the |
| 339 | queue will only be done if the driver sees that the remote stub has been |
| 340 | initialized on the subsystem. |
| 341 | |
| 342 | Therefore even if any untrusted software running on the AP requests |
| 343 | the services of the Remote Debug Driver and inject RSP messages |
| 344 | into the shared memory buffer, these RSP messages will be discarded and |
| 345 | an appropriate error code will be sent up to the invoking application. |
| 346 | |
| 347 | Performance |
| 348 | =========== |
| 349 | |
| 350 | During operation, the Remote Debug Driver copies RSP messages |
| 351 | asynchronously sent from the host debugger to the remote stub and vice |
| 352 | versa. The debug messages are ASCII based and relatively short |
| 353 | (<25 bytes) and may once in a while go up to a maximum 700 bytes |
| 354 | depending on the command the user requested. Thus we do not |
| 355 | anticipate any major performance impact. Moreover, in a typical |
| 356 | functional debug scenario performance should not be a concern. |
| 357 | |
| 358 | Interface |
| 359 | ========= |
| 360 | |
| 361 | The Remote Debug Driver is a character based device that manages |
| 362 | a piece of shared memory that is used as a bi-directional |
| 363 | single producer/consumer circular queue using a next fit allocator. |
| 364 | Every subsystem, has its own shared memory buffer that is managed |
| 365 | like a separate device. |
| 366 | |
| 367 | The driver distinguishes each subsystem processor's buffer by |
| 368 | registering a node with a different minor number. |
| 369 | |
| 370 | For each subsystem that is supported, the driver exposes a user space |
| 371 | interface through the following node: |
| 372 | - /dev/rdbg-<subsystem> |
| 373 | Ex. /dev/rdbg-adsp (for the ADSP subsystem) |
| 374 | |
| 375 | The standard open(), close(), read() and write() API set is |
| 376 | implemented. |
| 377 | |
| 378 | The open() syscall will fail if a subsystem is not present or supported |
| 379 | by the driver or a shared memory buffer cannot be allocated for the |
| 380 | AP - subsystem communication. It will also fail if the subsytem has |
| 381 | not initialized the queue on its side. Here are the error codes returned |
| 382 | in case a call to open() fails: |
| 383 | ENODEV - memory was not yet allocated for the device |
| 384 | EEXIST - device is already opened |
| 385 | ENOMEM - SMEM allocation failed |
| 386 | ECOMM - Subsytem queue is not yet setup |
| 387 | ENOMEM - Failure to initialize SMQ |
| 388 | |
| 389 | read() is a blocking call that will return with the number of bytes written |
| 390 | by the subsystem whenever the subsystem sends it some payload. Here are the |
| 391 | error codes returned in case a call to read() fails: |
| 392 | EINVAL - Invalid input |
| 393 | ENODEV - Device has not been opened yet |
| 394 | ERESTARTSYS - call to wait_for_completion_interruptible is interrupted |
| 395 | ENODATA - call to smq_receive failed |
| 396 | |
| 397 | write() attempts to send user mode payload out to the subsystem. It can fail |
| 398 | if the SMQ is full. The number of bytes written is returned back to the user. |
| 399 | Here are the error codes returned in case a call to write() fails: |
| 400 | EINVAL - Invalid input |
| 401 | ECOMM - SMQ send failed |
| 402 | |
| 403 | In the close() syscall, the control information state of the SMQ is |
| 404 | initialized to zero thereby preventing any further communication between |
| 405 | the AP and the subsystem. Here is the error code returned in case |
| 406 | a call to close() fails: |
| 407 | ENODEV - device wasn't opened/initialized |
| 408 | |
| 409 | The Remote Debug driver uses SMP2P for bi-directional AP to subsystem |
| 410 | notification. Notifications are sent to indicate that there are new |
| 411 | debug messages available for processing. Each subsystem that is |
| 412 | supported will need to add a device tree entry per the usage |
| 413 | specification of SMP2P driver. |
| 414 | |
| 415 | In case the remote stub becomes non operational or the security configuration |
| 416 | on the subsystem does not permit debugging, any messages put in the SMQ will |
| 417 | not be responded to. It is the responsibility of the Debug Agent app and the |
| 418 | host debugger application such as GDB to timeout and notify the user of the |
| 419 | non availability of remote debugging. |
| 420 | |
| 421 | Driver parameters |
| 422 | ================= |
| 423 | |
| 424 | None |
| 425 | |
| 426 | Config options |
| 427 | ============== |
| 428 | |
| 429 | The driver is configured with a device tree entry to map an SMP2P entry |
| 430 | to the device. The SMP2P entry name used is "rdbg". Please see |
| 431 | kernel\Documentation\arm\msm\msm_smp2p.txt for information about the |
| 432 | device tree entry required to configure SMP2P. |
| 433 | |
| 434 | The driver uses the SMEM allocation type SMEM_LC_DEBUGGER to allocate memory |
| 435 | for the queue that is used to share data with the subsystems. |
| 436 | |
| 437 | Dependencies |
| 438 | ============ |
| 439 | |
| 440 | The Debug Agent driver requires services of SMEM to |
| 441 | allocate shared memory buffers. |
| 442 | |
| 443 | SMP2P is used as a bi-directional notification |
| 444 | mechanism between the AP and a subsystem processor. |
| 445 | |
| 446 | User space utilities |
| 447 | ==================== |
| 448 | |
| 449 | This driver is meant to be used in conjunction with the user mode |
| 450 | Remote Debug Agent application. |
| 451 | |
| 452 | Other |
| 453 | ===== |
| 454 | |
| 455 | None |
| 456 | |
| 457 | Known issues |
| 458 | ============ |
| 459 | For targets with an external subsystem, we cannot use |
| 460 | shared memory for communication and would have to use the prevailing |
| 461 | transport mechanisms that exists between the AP and the external subsystem. |
| 462 | |
| 463 | This driver cannot be leveraged for such targets. |
| 464 | |
| 465 | To do |
| 466 | ===== |
| 467 | |
| 468 | None |