Blame - server/cros/dynamic_suite.py - platform/external/autotest

2012-02-15 14:21:02 -0800

[diff] [blame]

1

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

2

# Use of this source code is governed by a BSD-style license that can be

3

# found in the LICENSE file.

4

5

import common

Chris Masone

11aae45

2012-05-21 16:08:39 -0700

[diff] [blame^]

6

import compiler, datetime, hashlib, logging, os, random, re, time, traceback

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

7

from autotest_lib.client.common_lib import base_job, control_data, global_config

8

from autotest_lib.client.common_lib import error, utils

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

9

from autotest_lib.client.common_lib.cros import dev_server

Chris Masone

2012-02-15 14:21:02 -0800

[diff] [blame]

10

from autotest_lib.server.cros import control_file_getter, frontend_wrappers

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

11

from autotest_lib.server import frontend

Chris Masone

2012-05-08 22:14:18 -0700

[diff] [blame]

12

from autotest_lib.frontend.afe.json_rpc import proxy

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

13

Chris Masone

6cfb712

2012-05-02 11:36:28 -0700

[diff] [blame]

14

"""CrOS dynamic test suite generation and execution module.

15

16

This module implements runtime-generated test suites for CrOS.

17

Design doc: http://goto.google.com/suitesv2

18

19

Individual tests can declare themselves as a part of one or more

20

suites, and the code here enables control files to be written

21

that can refer to these "dynamic suites" by name. We also provide

22

support for reimaging devices with a given build and running a

23

dynamic suite across all reimaged devices.

24

25

The public API for defining a suite includes one method: reimage_and_run().

26

A suite control file can be written by importing this module and making

27

an appropriate call to this single method. In normal usage, this control

28

file will be run in a 'hostless' server-side autotest job, scheduling

29

sub-jobs to do the needed reimaging and test running.

30

31

Example control file:

32

33

import common

34

from autotest_lib.server.cros import dynamic_suite

35

36

dynamic_suite.reimage_and_run(

37

build=build, board=board, name='bvt', job=job, pool=pool,

38

check_hosts=check_hosts, add_experimental=True, num=4,

39

skip_reimage=dynamic_suite.skip_reimage(globals()))

40

41

This will -- at runtime -- find all control files that contain "bvt"

42

in their "SUITE=" clause, schedule jobs to reimage 4 devices in the

43

specified pool of the specified board with the specified build and,

44

upon completion of those jobs, schedule and wait for jobs that run all

45

the tests it discovered across those 4 machines.

46

47

Suites can be run by using the atest command-line tool:

48

atest suite create -b <board> -i <build/name> <suite>

49

e.g.

50

atest suite create -b x86-mario -i x86-mario/R20-2203.0.0 bvt

51

52

-------------------------------------------------------------------------

53

Implementation details

54

55

In addition to the create_suite_job() RPC defined in the autotest frontend,

56

there are two main classes defined here: Suite and Reimager.

57

58

A Suite instance represents a single test suite, defined by some predicate

59

run over all known control files. The simplest example is creating a Suite

60

by 'name'.

61

62

The Reimager class provides support for reimaging a heterogenous set

63

of devices with an appropriate build, in preparation for a test run.

64

One could use a single Reimager, followed by the instantiation and use

65

of multiple Suite objects.

66

67

create_suite_job() takes the parameters needed to define a suite run (board,

68

build to test, machine pool, and which suite to run), ensures important

69

preconditions are met, finds the appropraite suite control file, and then

70

schedules the hostless job that will do the rest of the work.

71

72

reimage_and_run() works by creating a Reimager, using it to perform the

73

requested installs, and then instantiating a Suite and running it on the

74

machines that were just reimaged. We'll go through this process in stages.

75

76

- create_suite_job()

77

The primary role of create_suite_job() is to ensure that the required

78

artifacts for the build to be tested are staged on the dev server. This

79

includes payloads required to autoupdate machines to the desired build, as

80

well as the autotest control files appropriate for that build. Then, the

81

RPC pulls the control file for the suite to be run from the dev server and

82

uses it to create the suite job with the autotest frontend.

83

84

+----------------+

85

| Google Storage | Client

86

+----------------+ |

87

| ^ | create_suite_job()

88

payloads/ | | |

89

control files | | request |

90

V | V

91

+-------------+ download request +--------------------------+

92

| |<----------------------| |

93

| Dev Server | | Autotest Frontend (AFE) |

94

| |---------------------->| |

95

+-------------+ suite control file +--------------------------+

|

V

Suite Job (hostless)

- The Reimaging process

101

In short, the Reimager schedules and waits for a number of autoupdate 'test'

102

jobs that perform image installation and make sure the device comes back up.

103

It labels the machines that it reimages with the newly-installed CrOS version,

104

so that later steps in the can refer to the machines by version and board,

105

instead of having to keep track of hostnames or some such.

106

107

The number of machines to use is called the 'sharding_factor', and the default

108

is defined in the [CROS] section of global_config.ini. This can be overridden

109

by passing a 'num=N' parameter to reimage_and_run() as shown in the example

above.

Step by step:

1) Schedule autoupdate 'tests' across N devices of the appropriate board.

114

- Technically, one job that has N tests across N hosts.

115

- This 'test' is in server/site_tests/autoupdate/

116

- The control file is modified at runtime to inject the name of the build

117

to install, and the URL to get said build from.

118

- This is the _TOT_ version of the autoupdate test; it must be able to run

119

successfully on all currently supported branches at all times.

120

2) Wait for this job to get kicked off and run to completion.

121

3) Label successfully reimaged devices with a 'cros-version' label

122

- This is actually done by the autoupdate 'test' control file.

123

4) Add a host attribute ('job_repo_url') to each reimaged host indicating

124

the URL where packages should be downloaded for subsequent tests

125

- This is actually done by the autoupdate 'test' control file

126

- This information is consumed in server/site_autotest.py

127

- job_repo_url points to some location on the dev server, where build

128

artifacts are staged -- including autotest packages.

129

5) Return success or failure.

130

131

+------------+ +--------------------------+

132

| | | |

133

| Dev Server | | Autotest Frontend (AFE) |

134

| | | [Suite Job] |

135

+------------+ +--------------------------+

136

137

V V autoupdate test | | |

138

+--------+ +--------+ <-----+----------------+ | |

139

| Host 1 |<------| Host 2 |-------+ | |

140

+--------+ +--------+ label | |

141

VersLabel VersLabel <-----------------------+ |

142

job_repo_url job_repo_url <-----------------------------+

143

host-attribute

144

145

To sum up, after re-imaging, we have the following assumptions:

146

147

- These devices are labeled appropriately

148

- They have a host attribute called 'job_repo_url' dictating where autotest

149

packages can be downloaded for test runs.

- Running Suites

A Suite instance uses the labels created by the Reimager to schedule test jobs

154

across all the hosts that were just reimaged. It then waits for all these jobs.

155

156

Step by step:

157

1) At instantiation time, find all appropriate control files for this suite

158

that were included in the build to be tested. To do this, we consult the

159

Dev Server, where all these control files are staged.

160

161

+------------+ control files? +--------------------------+

162

| |<----------------------| |

163

| Dev Server | | Autotest Frontend (AFE) |

164

| |---------------------->| [Suite Job] |

165

+------------+ control files! +--------------------------+

166

167

2) Now that the Suite instance exists, it schedules jobs for every control

168

file it deemed appropriate, to be run on the hosts that were labeled

169

by the Reimager. We stuff keyvals into these jobs, indicating what

170

build they were testing and which suite they were for.

171

172

+--------------------------+ Job for VersLabel +--------+

173

| |------------------------>| Host 1 | VersLabel

174

| Autotest Frontend (AFE) | +--------+ +--------+

175

| [Suite Job] |----------->| Host 2 |

176

+--------------------------+ Job for +--------+

177

| ^ VersLabel VersLabel

| |

+----------------+

One job per test

{'build': build/name,

182

'suite': suite_name}

183

184

3) Now that all jobs are scheduled, they'll be doled out as labeled hosts

185

finish their assigned work and become available again.

186

4) As we clean up each job, we check to see if any crashes occurred. If they

187

did, we look at the 'build' keyval in the job to see which build's debug

188

symbols we'll need to symbolicate the crash dump we just found.

189

5) Using this info, we tell the Dev Server to stage the required debug symbols.

190

Once that's done, we ask the dev server to use those symbols to symbolicate

191

the crash dump in question.

+----------------+

| Google Storage |

+----------------+

| ^

symbols! | | symbols?

198

V |

199

+------------+ stage symbols for build +--------------------------+

200

| |<--------------------------| |

201

| | | |

202

| Dev Server | dump to symbolicate | Autotest Frontend (AFE) |

203

| |<--------------------------| [Suite Job] |

204

| |-------------------------->| |

205

+------------+ symbolicated dump +--------------------------+

206

207

6) As jobs finish, we record their success or failure in the status of the suite

208

job. We also record a 'job keyval' in the suite job for each test, noting

209

the job ID and job owner. This can be used to refer to test logs later.

210

7) Once all jobs are complete, status is recorded for the suite job, and the

211

job_repo_url host attribute is removed from all hosts used by the suite.

"""

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

215

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

216

# Job keyvals for finding debug symbols when processing crash dumps.

217

JOB_BUILD_KEY = 'build'

218

JOB_SUITE_KEY = 'suite'

219

220

# Job attribute and label names

221

JOB_REPO_URL = 'job_repo_url'

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

222

VERSION_PREFIX = 'cros-version:'

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

223

EXPERIMENTAL_PREFIX = 'experimental_'

224

REIMAGE_JOB_NAME = 'try_new_image'

225

226

# Timings

227

ARTIFACT_FINISHED_TIME = 'artifact_finished_time'

228

DOWNLOAD_STARTED_TIME = 'download_started_time'

229

PAYLOAD_FINISHED_TIME = 'payload_finished_time'

230

TIME_FMT = '%Y-%m-%d %H:%M:%S'

231

Chris Masone

2011-12-20 11:06:53 -0800

[diff] [blame]

232

CONFIG = global_config.global_config

233

234

Chris Masone

2012-05-08 22:14:18 -0700

[diff] [blame]

235

# Relevant CrosDynamicSuiteExceptions are defined in client/common_lib/error.py.

Chris Masone

502b71e

2012-04-10 10:41:35 -0700

[diff] [blame]

236

237

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

238

def reimage_and_run(**dargs):

239

"""

240

Backward-compatible API for dynamic_suite.

241

242

Will re-image a number of devices (of the specified board) with the

243

provided build, and then run the indicated test suite on them.

244

Guaranteed to be compatible with any build from stable to dev.

245

246

Currently required args:

247

@param build: the build to install e.g.

248

x86-alex-release/R18-1655.0.0-a1-b1584.

249

@param board: which kind of devices to reimage.

250

@param name: a value of the SUITE control file variable to search for.

251

@param job: an instance of client.common_lib.base_job representing the

252

currently running suite job.

253

254

Currently supported optional args:

255

@param pool: specify the pool of machines to use for scheduling purposes.

256

Default: None

257

@param num: how many devices to reimage.

258

Default in global_config

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

259

@param check_hosts: require appropriate hosts to be available now.

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

260

@param skip_reimage: skip reimaging, used for testing purposes.

261

Default: False

262

@param add_experimental: schedule experimental tests as well, or not.

263

Default: True

Chris Sosa

6b288c8

2012-03-29 15:31:06 -0700

[diff] [blame]

264

@raises AsynchronousBuildFailure: if there was an issue finishing staging

265

from the devserver.

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

266

"""

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

267

(build, board, name, job, pool, num, check_hosts, skip_reimage,

268

add_experimental) = _vet_reimage_and_run_args(**dargs)

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

269

board = 'board:%s' % board

270

if pool:

271

pool = 'pool:%s' % pool

Chris Masone

2012-03-05 13:45:25 -0800

[diff] [blame]

272

reimager = Reimager(job.autodir, pool=pool, results_dir=job.resultdir)

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

273

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

274

if skip_reimage or reimager.attempt(build, board, job.record, check_hosts,

275

num=num):

Chris Sosa

6b288c8

2012-03-29 15:31:06 -0700

[diff] [blame]

276

277

# Ensure that the image's artifacts have completed downloading.

Chris Masone

f70650c

2012-05-16 08:52:12 -0700

[diff] [blame]

278

try:

279

ds = dev_server.DevServer.create()

280

ds.finish_download(build)

281

except dev_server.DevServerException as e:

282

raise error.AsynchronousBuildFailure(e)

283

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

284

timestamp = datetime.datetime.now().strftime(TIME_FMT)

Chris Masone

a8066a9

2012-05-01 16:52:31 -0700

[diff] [blame]

285

utils.write_keyval(job.resultdir,

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

286

{ARTIFACT_FINISHED_TIME: timestamp})

Chris Sosa

6b288c8

2012-03-29 15:31:06 -0700

[diff] [blame]

287

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

288

suite = Suite.create_from_name(name, build, pool=pool,

289

results_dir=job.resultdir)

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

290

suite.run_and_wait(job.record_entry, add_experimental=add_experimental)

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

291

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

292

reimager.clear_reimaged_host_state(build)

293

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

294

295

def _vet_reimage_and_run_args(build=None, board=None, name=None, job=None,

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

296

pool=None, num=None, check_hosts=True,

297

skip_reimage=False, add_experimental=True,

298

**dargs):

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

299

"""

300

Vets arguments for reimage_and_run().

301

302

Currently required args:

303

@param build: the build to install e.g.

304

x86-alex-release/R18-1655.0.0-a1-b1584.

305

@param board: which kind of devices to reimage.

306

@param name: a value of the SUITE control file variable to search for.

307

@param job: an instance of client.common_lib.base_job representing the

308

currently running suite job.

309

310

Currently supported optional args:

311

@param pool: specify the pool of machines to use for scheduling purposes.

312

Default: None

313

@param num: how many devices to reimage.

314

Default in global_config

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

315

@param check_hosts: require appropriate hosts to be available now.

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

316

@param skip_reimage: skip reimaging, used for testing purposes.

317

Default: False

318

@param add_experimental: schedule experimental tests as well, or not.

319

Default: True

320

@return a tuple of args set to provided (or default) values.

321

"""

322

required_keywords = {'build': str,

323

'board': str,

324

'name': str,

325

'job': base_job.base_job}

326

for key, expected in required_keywords.iteritems():

327

value = locals().get(key)

328

if not value or not isinstance(value, expected):

Chris Masone

2012-05-08 22:14:18 -0700

[diff] [blame]

329

raise error.SuiteArgumentException(

330

"reimage_and_run() needs %s=<%r>" % (key, expected))

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

331

return (build, board, name, job, pool, num, check_hosts, skip_reimage,

332

add_experimental)

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

333

334

Chris Masone

2012-01-17 11:12:51 -0800

[diff] [blame]

335

def inject_vars(vars, control_file_in):

336

"""

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

337

Inject the contents of |vars| into |control_file_in|.

Chris Masone

2012-01-17 11:12:51 -0800

[diff] [blame]

338

339

@param vars: a dict to shoehorn into the provided control file string.

340

@param control_file_in: the contents of a control file to munge.

341

@return the modified control file string.

342

"""

343

control_file = ''

344

for key, value in vars.iteritems():

Chris Masone

6cb0d0d

2012-03-05 15:37:49 -0800

[diff] [blame]

345

# None gets injected as 'None' without this check; same for digits.

346

if isinstance(value, str):

347

control_file += "%s='%s'\n" % (key, value)

348

else:

349

control_file += "%s=%r\n" % (key, value)

Chris Masone

2012-01-17 11:12:51 -0800

[diff] [blame]

350

return control_file + control_file_in

351

352

Chris Masone

2011-12-20 11:06:53 -0800

[diff] [blame]

353

def _image_url_pattern():

354

return CONFIG.get_config_value('CROS', 'image_url_pattern', type=str)

355

356

357

def _package_url_pattern():

358

return CONFIG.get_config_value('CROS', 'package_url_pattern', type=str)

359

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

360

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

361

def skip_reimage(g):

362

return g.get('SKIP_IMAGE')

363

364

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

365

class Reimager(object):

366

"""

367

A class that can run jobs to reimage devices.

368

369

@var _afe: a frontend.AFE instance used to talk to autotest.

370

@var _tko: a frontend.TKO instance used to query the autotest results db.

371

@var _cf_getter: a ControlFileGetter used to get the AU control file.

"""

Chris Masone

2012-03-05 13:45:25 -0800

[diff] [blame]

375

def __init__(self, autotest_dir, afe=None, tko=None, pool=None,

376

results_dir=None):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

"""

Constructor

@param autotest_dir: the place to find autotests.

381

@param afe: an instance of AFE as defined in server/frontend.py.

382

@param tko: an instance of TKO as defined in server/frontend.py.

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

383

@param pool: Specify the pool of machines to use for scheduling

384

purposes.

Chris Masone

2012-03-05 13:45:25 -0800

[diff] [blame]

385

@param results_dir: The directory where the job can write results to.

386

This must be set if you want job_id of sub-jobs

387

list in the job keyvals.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

388

"""

Chris Masone

2012-02-15 14:21:02 -0800

[diff] [blame]

389

self._afe = afe or frontend_wrappers.RetryingAFE(timeout_min=30,

390

delay_sec=10,

391

debug=False)

392

self._tko = tko or frontend_wrappers.RetryingTKO(timeout_min=30,

393

delay_sec=10,

394

debug=False)

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

395

self._pool = pool

Chris Masone

2012-03-05 13:45:25 -0800

[diff] [blame]

396

self._results_dir = results_dir

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

397

self._reimaged_hosts = {}

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

398

self._cf_getter = control_file_getter.FileSystemGetter(

399

[os.path.join(autotest_dir, 'server/site_tests')])

400

401

Chris Masone

2011-12-20 11:06:53 -0800

[diff] [blame]

402

def skip(self, g):

Chris Masone

2012-02-29 18:54:58 -0800

[diff] [blame]

403

"""Deprecated in favor of dynamic_suite.skip_reimage()."""

Chris Masone

2011-12-20 11:06:53 -0800

[diff] [blame]

404

return 'SKIP_IMAGE' in g and g['SKIP_IMAGE']

405

406

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

407

def attempt(self, build, board, record, check_hosts, num=None):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

408

"""

409

Synchronously attempt to reimage some machines.

410

411

Fire off attempts to reimage |num| machines of type |board|, using an

Chris Masone

2012-01-31 09:27:36 -0800

[diff] [blame]

412

image at |url| called |build|. Wait for completion, polling every

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

413

10s, and log results with |record| upon completion.

414

Chris Masone

2012-01-31 09:27:36 -0800

[diff] [blame]

415

@param build: the build to install e.g.

416

x86-alex-release/R18-1655.0.0-a1-b1584.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

417

@param board: which kind of devices to reimage.

418

@param record: callable that records job status.

Chris Masone

2012-02-22 16:53:31 -0800

[diff] [blame]

419

prototype:

420

record(status, subdir, name, reason)

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

421

@param check_hosts: require appropriate hosts to be available now.

Chris Masone

5552dd7

2012-02-15 15:01:04 -0800

[diff] [blame]

422

@param num: how many devices to reimage.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

423

@return True if all reimaging jobs succeed, false otherwise.

424

"""

Chris Masone

5552dd7

2012-02-15 15:01:04 -0800

[diff] [blame]

425

if not num:

426

num = CONFIG.get_config_value('CROS', 'sharding_factor', type=int)

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

427

logging.debug("scheduling reimaging across %d machines", num)

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

428

record('START', None, REIMAGE_JOB_NAME)

Chris Masone

2012-02-22 16:53:31 -0800

[diff] [blame]

429

try:

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

430

self._ensure_version_label(VERSION_PREFIX + build)

431

432

if check_hosts:

433

self._ensure_enough_hosts(board, self._pool, num)

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

434

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

435

# Schedule job and record job metadata.

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

436

canary_job = self._schedule_reimage_job(build, num, board)

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

437

self._record_job_if_possible(REIMAGE_JOB_NAME, canary_job)

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

438

logging.debug('Created re-imaging job: %d', canary_job.id)

439

440

# Poll until reimaging is complete.

441

self._wait_for_job_to_start(canary_job.id)

442

self._wait_for_job_to_finish(canary_job.id)

443

444

# Gather job results.

445

canary_job.result = self._afe.poll_job_results(self._tko,

446

canary_job,

447

0)

Chris Masone

2012-05-08 22:14:18 -0700

[diff] [blame]

448

except error.InadequateHostsException as e:

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

449

logging.warning(e)

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

450

record('END WARN', None, REIMAGE_JOB_NAME, str(e))

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

451

return False

Chris Masone

2012-02-22 16:53:31 -0800

[diff] [blame]

452

except Exception as e:

453

# catch Exception so we record the job as terminated no matter what.

454

logging.error(e)

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

455

record('END ERROR', None, REIMAGE_JOB_NAME, str(e))

Chris Masone

2012-02-22 16:53:31 -0800

[diff] [blame]

456

return False

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

457

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

458

self._remember_reimaged_hosts(build, canary_job)

459

460

if canary_job.result is True:

461

self._report_results(canary_job, record)

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

462

record('END GOOD', None, REIMAGE_JOB_NAME)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

463

return True

464

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

465

if canary_job.result is None:

466

record('FAIL', None, canary_job.name, 'reimaging tasks did not run')

467

else: # canary_job.result is False

468

self._report_results(canary_job, record)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

469

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

470

record('END FAIL', None, REIMAGE_JOB_NAME)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

return False

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

474

def _ensure_enough_hosts(self, board, pool, num):

475

"""

476

Determine if there are enough working hosts to run on.

477

478

Raises exception if there are not enough hosts.

479

480

@param board: which kind of devices to reimage.

481

@param pool: the pool of machines to use for scheduling purposes.

482

@param num: how many devices to reimage.

Chris Masone

2012-05-08 22:14:18 -0700

[diff] [blame]

483

@raises NoHostsException: if no working hosts.

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

484

@raises InadequateHostsException: if too few working hosts.

485

"""

486

labels = [l for l in [board, pool] if l is not None]

Chris Masone

502b71e

2012-04-10 10:41:35 -0700

[diff] [blame]

487

available = self._count_usable_hosts(labels)

488

if available == 0:

Chris Masone

2012-05-08 22:14:18 -0700

[diff] [blame]

489

raise error.NoHostsException('All hosts with %r are dead!' % labels)

Chris Masone

502b71e

2012-04-10 10:41:35 -0700

[diff] [blame]

490

elif num > available:

Chris Masone

2012-05-08 22:14:18 -0700

[diff] [blame]

491

raise error.InadequateHostsException(

492

'Too few hosts with %r' % labels)

Chris Masone

2012-03-08 15:18:43 -0800

[diff] [blame]

493

494

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

495

def _wait_for_job_to_start(self, job_id):

496

"""

497

Wait for the job specified by |job_id| to start.

498

499

@param job_id: the job ID to poll on.

500

"""

501

while len(self._afe.get_jobs(id=job_id, not_yet_run=True)) > 0:

502

time.sleep(10)

503

logging.debug('Re-imaging job running.')

504

505

506

def _wait_for_job_to_finish(self, job_id):

507

"""

508

Wait for the job specified by |job_id| to finish.

509

510

@param job_id: the job ID to poll on.

511

"""

512

while len(self._afe.get_jobs(id=job_id, finished=True)) == 0:

513

time.sleep(10)

514

logging.debug('Re-imaging job finished.')

515

516

517

def _remember_reimaged_hosts(self, build, canary_job):

518

"""

519

Remember hosts that were reimaged with |build| as a part |canary_job|.

520

521

@param build: the build that was installed e.g.

522

x86-alex-release/R18-1655.0.0-a1-b1584.

523

@param canary_job: a completed frontend.Job object, possibly populated

524

by frontend.AFE.poll_job_results.

525

"""

526

if not hasattr(canary_job, 'results_platform_map'):

527

return

528

if not self._reimaged_hosts.get('build'):

529

self._reimaged_hosts[build] = []

530

for platform in canary_job.results_platform_map:

531

for host in canary_job.results_platform_map[platform]['Total']:

532

self._reimaged_hosts[build].append(host)

533

534

535

def clear_reimaged_host_state(self, build):

536

"""

537

Clear per-host state created in the autotest DB for this job.

538

539

After reimaging a host, we label it and set some host attributes on it

540

that are then used by the suite scheduling code. This call cleans

541

that up.

542

543

@param build: the build whose hosts we want to clean up e.g.

544

x86-alex-release/R18-1655.0.0-a1-b1584.

545

"""

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

546

for host in self._reimaged_hosts.get('build', []):

547

self._clear_build_state(host)

548

549

550

def _clear_build_state(self, machine):

551

"""

552

Clear all build-specific labels, attributes from the target.

553

554

@param machine: the host to clear labels, attributes from.

555

"""

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

556

self._afe.set_host_attribute(JOB_REPO_URL, None, hostname=machine)

Chris Masone

2012-03-07 15:16:59 -0800

[diff] [blame]

557

558

Chris Masone

2012-03-05 13:45:25 -0800

[diff] [blame]

559

def _record_job_if_possible(self, test_name, job):

560

"""

561

Record job id as keyval, if possible, so it can be referenced later.

562

563

If |self._results_dir| is None, then this is a NOOP.

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

564

565

@param test_name: the test to record id/owner for.

566

@param job: the job object to pull info from.

Chris Masone

2012-03-05 13:45:25 -0800

[diff] [blame]

567

"""

568

if self._results_dir:

569

job_id_owner = '%s-%s' % (job.id, job.owner)

Chris Masone

11aae45

2012-05-21 16:08:39 -0700

[diff] [blame^]

570

utils.write_keyval(

571

self._results_dir,

572

{hashlib.md5(test_name).hexdigest(): job_id_owner})

Chris Masone

2012-03-05 13:45:25 -0800

[diff] [blame]

573

574

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

575

def _count_usable_hosts(self, host_spec):

576

"""

577

Given a set of host labels, count the live hosts that have them all.

578

579

@param host_spec: list of labels specifying a set of hosts.

580

@return the number of live hosts that satisfy |host_spec|.

581

"""

582

count = 0

583

for h in self._afe.get_hosts(multiple_labels=host_spec):

584

if h.status not in ['Repair Failed', 'Repairing']:

count += 1

return count

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

589

def _ensure_version_label(self, name):

590

"""

591

Ensure that a label called |name| exists in the autotest DB.

592

593

@param name: the label to check for/create.

594

"""

Chris Masone

47c9e64

2012-04-25 14:22:18 -0700

[diff] [blame]

595

try:

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

596

self._afe.create_label(name=name)

Chris Masone

47c9e64

2012-04-25 14:22:18 -0700

[diff] [blame]

597

except proxy.ValidationError as ve:

598

if ('name' in ve.problem_keys and

599

'This value must be unique' in ve.problem_keys['name']):

600

logging.debug('Version label %s already exists', name)

601

else:

602

raise ve

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

603

604

Chris Masone

2012-01-31 09:27:36 -0800

[diff] [blame]

605

def _schedule_reimage_job(self, build, num_machines, board):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

"""

Sends an RPC to the autotest frontend to enqueue reimaging jobs on

610

|num_machines| devices of type |board|

611

Chris Masone

2012-01-31 09:27:36 -0800

[diff] [blame]

612

@param build: the build to install (must be unique).

Chris Masone

2011-12-20 11:06:53 -0800

[diff] [blame]

613

@param num_machines: how many devices to reimage.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

614

@param board: which kind of devices to reimage.

615

@return a frontend.Job object for the reimaging job we scheduled.

616

"""

Chris Masone

2012-01-17 11:12:51 -0800

[diff] [blame]

617

control_file = inject_vars(

Chris Masone

2012-01-31 09:27:36 -0800

[diff] [blame]

618

{'image_url': _image_url_pattern() % build, 'image_name': build},

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

619

self._cf_getter.get_control_file_contents_by_name('autoupdate'))

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

620

job_deps = []

621

if self._pool:

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

622

meta_host = self._pool

623

board_label = board

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

624

job_deps.append(board_label)

625

else:

626

# No pool specified use board.

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

627

meta_host = board

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

628

Chris Masone

2011-12-20 11:06:53 -0800

[diff] [blame]

629

return self._afe.create_job(control_file=control_file,

Chris Masone

2012-01-31 09:27:36 -0800

[diff] [blame]

630

name=build + '-try',

Chris Masone

2011-12-20 11:06:53 -0800

[diff] [blame]

631

control_type='Server',

Chris Masone

9732536

2012-04-26 16:19:13 -0700

[diff] [blame]

632

priority='Low',

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

633

meta_hosts=[meta_host] * num_machines,

634

dependencies=job_deps)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

635

636

637

def _report_results(self, job, record):

638

"""

639

Record results from a completed frontend.Job object.

640

641

@param job: a completed frontend.Job object populated by

642

frontend.AFE.poll_job_results.

643

@param record: callable that records job status.

644

prototype:

645

record(status, subdir, name, reason)

646

"""

647

if job.result == True:

648

record('GOOD', None, job.name)

649

return

650

651

for platform in job.results_platform_map:

652

for status in job.results_platform_map[platform]:

653

if status == 'Total':

654

continue

655

for host in job.results_platform_map[platform][status]:

656

if host not in job.test_status:

657

record('ERROR', None, host, 'Job failed to run.')

658

elif status == 'Failed':

659

for test_status in job.test_status[host].fail:

660

record('FAIL', None, host, test_status.reason)

661

elif status == 'Aborted':

662

for test_status in job.test_status[host].fail:

663

record('ABORT', None, host, test_status.reason)

664

elif status == 'Completed':

665

record('GOOD', None, host)

666

667

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

668

class Status(object):

669

"""

670

A class representing a test result.

671

672

Stores all pertinent info about a test result and, given a callable

673

to use, can record start, result, and end info appropriately.

674

675

@var _status: status code, e.g. 'INFO', 'FAIL', etc.

676

@var _test_name: the name of the test whose result this is.

677

@var _reason: message explaining failure, if any.

678

@var _begin_timestamp: when test started (in seconds since the epoch).

679

@var _end_timestamp: when test finished (in seconds since the epoch).

680

681

@var _TIME_FMT: format string for parsing human-friendly timestamps.

"""

_status = None

_test_name = None

_reason = None

_begin_timestamp = None

687

_end_timestamp = None

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

688

689

690

def __init__(self, status, test_name, reason='', begin_time_str=None,

end_time_str=None):

"""

Constructor

@param status: status code, e.g. 'INFO', 'FAIL', etc.

696

@param test_name: the name of the test whose result this is.

697

@param reason: message explaining failure, if any; Optional.

698

@param begin_time_str: when test started (in _TIME_FMT); now() if None.

699

@param end_time_str: when test finished (in _TIME_FMT); now() if None.

700

"""

701

702

self._status = status

703

self._test_name = test_name

704

self._reason = reason

705

if begin_time_str:

706

self._begin_timestamp = int(time.mktime(

707

datetime.datetime.strptime(

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

708

begin_time_str, TIME_FMT).timetuple()))

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

709

else:

710

self._begin_timestamp = time.time()

711

712

if end_time_str:

713

self._end_timestamp = int(time.mktime(

714

datetime.datetime.strptime(

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

715

end_time_str, TIME_FMT).timetuple()))

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

716

else:

717

self._end_timestamp = time.time()

718

719

720

def record_start(self, record_entry):

721

"""

722

Use record_entry to log message about start of test.

723

724

@param record_entry: a callable to use for logging.

725

prototype:

726

record_entry(base_job.status_log_entry)

727

"""

728

record_entry(

729

base_job.status_log_entry(

730

'START', None, self._test_name, '',

731

None, self._begin_timestamp))

732

733

734

def record_result(self, record_entry):

735

"""

736

Use record_entry to log message about result of test.

737

738

@param record_entry: a callable to use for logging.

739

prototype:

740

record_entry(base_job.status_log_entry)

741

"""

742

record_entry(

743

base_job.status_log_entry(

744

self._status, None, self._test_name, self._reason,

745

None, self._end_timestamp))

746

747

748

def record_end(self, record_entry):

749

"""

750

Use record_entry to log message about end of test.

751

752

@param record_entry: a callable to use for logging.

753

prototype:

754

record_entry(base_job.status_log_entry)

755

"""

756

record_entry(

757

base_job.status_log_entry(

758

'END %s' % self._status, None, self._test_name, '',

759

None, self._end_timestamp))

760

761

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

762

class Suite(object):

763

"""

764

A suite of tests, defined by some predicate over control file variables.

765

766

Given a place to search for control files a predicate to match the desired

767

tests, can gather tests and fire off jobs to run them, and then wait for

768

results.

769

770

@var _predicate: a function that should return True when run over a

771

ControlData representation of a control file that should be in

772

this Suite.

773

@var _tag: a string with which to tag jobs run in this suite.

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

774

@var _build: the build on which we're running this suite.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

775

@var _afe: an instance of AFE as defined in server/frontend.py.

776

@var _tko: an instance of TKO as defined in server/frontend.py.

777

@var _jobs: currently scheduled jobs, if any.

778

@var _cf_getter: a control_file_getter.ControlFileGetter

"""

Chris Masone

2012-01-17 11:16:32 -0800

[diff] [blame]

782

@staticmethod

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

783

def create_ds_getter(build):

Chris Masone

2012-01-17 11:16:32 -0800

[diff] [blame]

784

"""

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

785

@param build: the build on which we're running this suite.

Chris Masone

2012-01-17 11:16:32 -0800

[diff] [blame]

786

@return a FileSystemGetter instance that looks under |autotest_dir|.

787

"""

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

788

return control_file_getter.DevServerGetter(

789

build, dev_server.DevServer.create())

Chris Masone

2012-01-17 11:16:32 -0800

[diff] [blame]

790

791

792

@staticmethod

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

793

def create_fs_getter(autotest_dir):

794

"""

795

@param autotest_dir: the place to find autotests.

796

@return a FileSystemGetter instance that looks under |autotest_dir|.

797

"""

798

# currently hard-coded places to look for tests.

799

subpaths = ['server/site_tests', 'client/site_tests',

800

'server/tests', 'client/tests']

801

directories = [os.path.join(autotest_dir, p) for p in subpaths]

802

return control_file_getter.FileSystemGetter(directories)

803

804

805

@staticmethod

Zdenek Behan

2012-02-29 19:16:28 +0100

[diff] [blame]

806

def parse_tag(tag):

807

"""Splits a string on ',' optionally surrounded by whitespace."""

808

return map(lambda x: x.strip(), tag.split(','))

809

810

811

@staticmethod

Chris Masone

2012-02-23 10:52:42 -0800

[diff] [blame]

812

def name_in_tag_predicate(name):

813

"""Returns predicate that takes a control file and looks for |name|.

814

815

Builds a predicate that takes in a parsed control file (a ControlData)

816

and returns True if the SUITE tag is present and contains |name|.

817

818

@param name: the suite name to base the predicate on.

819

@return a callable that takes a ControlData and looks for |name| in that

820

ControlData object's suite member.

821

"""

Zdenek Behan

2012-02-29 19:16:28 +0100

[diff] [blame]

822

return lambda t: hasattr(t, 'suite') and \

823

name in Suite.parse_tag(t.suite)

Chris Masone

2012-02-23 10:52:42 -0800

[diff] [blame]

824

Zdenek Behan

2012-02-29 19:16:28 +0100

[diff] [blame]

825

826

@staticmethod

827

def list_all_suites(build, cf_getter=None):

828

"""

829

Parses all ControlData objects with a SUITE tag and extracts all

830

defined suite names.

831

832

@param cf_getter: control_file_getter.ControlFileGetter. Defaults to

833

using DevServerGetter.

834

835

@return list of suites

836

"""

837

if cf_getter is None:

838

cf_getter = Suite.create_ds_getter(build)

839

840

suites = set()

841

predicate = lambda t: hasattr(t, 'suite')

Scott Zawalski

f22b75d

2012-05-10 16:54:37 -0400

[diff] [blame]

842

for test in Suite.find_and_parse_tests(cf_getter, predicate,

843

add_experimental=True):

Zdenek Behan

2012-02-29 19:16:28 +0100

[diff] [blame]

844

suites.update(Suite.parse_tag(test.suite))

845

return list(suites)

Chris Masone

2012-02-23 10:52:42 -0800

[diff] [blame]

846

847

848

@staticmethod

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

849

def create_from_name(name, build, cf_getter=None, afe=None, tko=None,

850

pool=None, results_dir=None):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

851

"""

852

Create a Suite using a predicate based on the SUITE control file var.

853

854

Makes a predicate based on |name| and uses it to instantiate a Suite

855

that looks for tests in |autotest_dir| and will schedule them using

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

856

|afe|. Pulls control files from the default dev server.

857

Results will be pulled from |tko| upon completion.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

858

859

@param name: a value of the SUITE control file variable to search for.

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

860

@param build: the build on which we're running this suite.

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

861

@param cf_getter: a control_file_getter.ControlFileGetter.

862

If None, default to using a DevServerGetter.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

863

@param afe: an instance of AFE as defined in server/frontend.py.

864

@param tko: an instance of TKO as defined in server/frontend.py.

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

865

@param pool: Specify the pool of machines to use for scheduling

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

866

purposes.

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

867

@param results_dir: The directory where the job can write results to.

868

This must be set if you want job_id of sub-jobs

869

list in the job keyvals.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

870

@return a Suite instance.

871

"""

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

872

if cf_getter is None:

873

cf_getter = Suite.create_ds_getter(build)

Chris Masone

2012-02-23 10:52:42 -0800

[diff] [blame]

874

return Suite(Suite.name_in_tag_predicate(name),

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

875

name, build, cf_getter, afe, tko, pool, results_dir)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

876

877

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

878

def __init__(self, predicate, tag, build, cf_getter, afe=None, tko=None,

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

879

pool=None, results_dir=None):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

"""

Constructor

@param predicate: a function that should return True when run over a

884

ControlData representation of a control file that should be in

885

this Suite.

886

@param tag: a string with which to tag jobs run in this suite.

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

887

@param build: the build on which we're running this suite.

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

888

@param cf_getter: a control_file_getter.ControlFileGetter

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

889

@param afe: an instance of AFE as defined in server/frontend.py.

890

@param tko: an instance of TKO as defined in server/frontend.py.

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

891

@param pool: Specify the pool of machines to use for scheduling

892

purposes.

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

893

@param results_dir: The directory where the job can write results to.

894

This must be set if you want job_id of sub-jobs

895

list in the job keyvals.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

896

"""

897

self._predicate = predicate

898

self._tag = tag

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

899

self._build = build

Chris Masone

2012-02-22 14:53:42 -0800

[diff] [blame]

900

self._cf_getter = cf_getter

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

901

self._results_dir = results_dir

Chris Masone

2012-02-15 14:21:02 -0800

[diff] [blame]

902

self._afe = afe or frontend_wrappers.RetryingAFE(timeout_min=30,

903

delay_sec=10,

904

debug=False)

905

self._tko = tko or frontend_wrappers.RetryingTKO(timeout_min=30,

906

delay_sec=10,

907

debug=False)

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

908

self._pool = pool

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

909

self._jobs = []

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

910

self._tests = Suite.find_and_parse_tests(self._cf_getter,

911

self._predicate,

912

add_experimental=True)

@property

def tests(self):

"""

A list of ControlData objects in the suite, with added |text| attr.

"""

return self._tests

def stable_tests(self):

924

"""

925

|self.tests|, filtered for non-experimental tests.

926

"""

927

return filter(lambda t: not t.experimental, self.tests)

928

929

930

def unstable_tests(self):

931

"""

932

|self.tests|, filtered for experimental tests.

933

"""

934

return filter(lambda t: t.experimental, self.tests)

935

936

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

937

def _create_job(self, test):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

938

"""

939

Thin wrapper around frontend.AFE.create_job().

940

941

@param test: ControlData object for a test to run.

Scott Zawalski

e5bb1c5

2012-02-29 13:15:50 -0500

[diff] [blame]

942

@return a frontend.Job object with an added test_name member.

943

test_name is used to preserve the higher level TEST_NAME

944

name of the job.

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

945

"""

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

946

job_deps = []

947

if self._pool:

Chris Masone

2012-03-05 15:11:39 -0800

[diff] [blame]

948

meta_hosts = self._pool

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

949

cros_label = VERSION_PREFIX + self._build

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

950

job_deps.append(cros_label)

951

else:

952

# No pool specified use any machines with the following label.

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

953

meta_hosts = VERSION_PREFIX + self._build

Scott Zawalski

e5bb1c5

2012-02-29 13:15:50 -0500

[diff] [blame]

954

test_obj = self._afe.create_job(

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

955

control_file=test.text,

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

956

name='/'.join([self._build, self._tag, test.name]),

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

957

control_type=test.test_type.capitalize(),

Scott Zawalski

2012-02-16 11:48:26 -0500

[diff] [blame]

958

meta_hosts=[meta_hosts],

Chris Masone

bafbbb0

2012-05-16 13:41:36 -0700

[diff] [blame]

959

dependencies=job_deps,

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

960

keyvals={JOB_BUILD_KEY: self._build, JOB_SUITE_KEY: self._tag})

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

961

Scott Zawalski

e5bb1c5

2012-02-29 13:15:50 -0500

[diff] [blame]

962

setattr(test_obj, 'test_name', test.name)

return test_obj

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

966

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

967

def run_and_wait(self, record, add_experimental=True):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

968

"""

969

Synchronously run tests in |self.tests|.

970

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

971

Schedules tests against a device running image |self._build|, and

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

972

then polls for status, using |record| to print status when each

973

completes.

974

975

Tests returned by self.stable_tests() will always be run, while tests

976

in self.unstable_tests() will only be run if |add_experimental| is true.

977

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

978

@param record: callable that records job status.

979

prototype:

980

record(status, subdir, name, reason)

981

@param add_experimental: schedule experimental tests as well, or not.

982

"""

Chris Masone

2012-05-08 14:07:13 -0700

[diff] [blame]

983

logging.debug('Discovered %d stable tests.', len(self.stable_tests()))

984

logging.debug('Discovered %d unstable tests.',

985

len(self.unstable_tests()))

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

986

try:

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

987

Status('INFO', 'Start %s' % self._tag).record_result(record)

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

988

self.schedule(add_experimental)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

989

try:

990

for result in self.wait_for_results():

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

991

result.record_start(record)

992

result.record_result(record)

993

result.record_end(record)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

994

except Exception as e:

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

995

logging.error(traceback.format_exc())

996

Status('FAIL', self._tag,

997

'Exception waiting for results').record_result(record)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

998

except Exception as e:

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

999

logging.error(traceback.format_exc())

1000

Status('FAIL', self._tag,

1001

'Exception while scheduling suite').record_result(record)

Chris Masone

2012-05-08 14:07:13 -0700

[diff] [blame]

1002

# Sanity check

1003

tests_at_end = self.find_and_parse_tests(self._cf_getter,

1004

self._predicate,

1005

add_experimental=True)

1006

if len(self.tests) != len(tests_at_end):

1007

msg = 'Dev Server enumerated %d tests at start, %d at end.' % (

1008

len(self.tests), len(tests_at_end))

1009

Status('FAIL', self._tag, msg).record_result(record)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1010

1011

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

1012

def schedule(self, add_experimental=True):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1013

"""

1014

Schedule jobs using |self._afe|.

1015

1016

frontend.Job objects representing each scheduled job will be put in

1017

|self._jobs|.

1018

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1019

@param add_experimental: schedule experimental tests as well, or not.

1020

"""

1021

for test in self.stable_tests():

1022

logging.debug('Scheduling %s', test.name)

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

1023

self._jobs.append(self._create_job(test))

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1024

1025

if add_experimental:

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1026

for test in self.unstable_tests():

Zdenek Behan

150fbd6

2012-04-06 17:20:01 +0200

[diff] [blame]

1027

logging.debug('Scheduling experimental %s', test.name)

Chris Masone

2012-05-15 13:34:21 -0700

[diff] [blame]

1028

test.name = EXPERIMENTAL_PREFIX + test.name

Chris Masone

2012-02-22 13:16:11 -0800

[diff] [blame]

1029

self._jobs.append(self._create_job(test))

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

1030

if self._results_dir:

1031

self._record_scheduled_jobs()

1032

1033

1034

def _record_scheduled_jobs(self):

1035

"""

1036

Record scheduled job ids as keyvals, so they can be referenced later.

Scott Zawalski

2012-02-28 14:10:47 -0500

[diff] [blame]

1037

"""

1038

for job in self._jobs:

1039

job_id_owner = '%s-%s' % (job.id, job.owner)

Chris Masone

11aae45

2012-05-21 16:08:39 -0700

[diff] [blame^]

1040

utils.write_keyval(

1041

self._results_dir,

1042

{hashlib.md5(job.test_name).hexdigest(): job_id_owner})

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1043

1044

1045

def _status_is_relevant(self, status):

1046

"""

1047

Indicates whether the status of a given test is meaningful or not.

1048

1049

@param status: frontend.TestStatus object to look at.

1050

@return True if this is a test result worth looking at further.

1051

"""

1052

return not (status.test_name.startswith('SERVER_JOB') or

1053

status.test_name.startswith('CLIENT_JOB'))

1054

1055

1056

def _collate_aborted(self, current_value, entry):

1057

"""

1058

reduce() over a list of HostQueueEntries for a job; True if any aborted.

1059

1060

Functor that can be reduced()ed over a list of

1061

HostQueueEntries for a job. If any were aborted

1062

(|entry.aborted| exists and is True), then the reduce() will

return True.

Ex:

entries = self._afe.run('get_host_queue_entries', job=job.id)

1067

reduce(self._collate_aborted, entries, False)

1068

1069

@param current_value: the current accumulator (a boolean).

1070

@param entry: the current entry under consideration.

1071

@return the value of |entry.aborted| if it exists, False if not.

1072

"""

1073

return current_value or ('aborted' in entry and entry['aborted'])

1074

1075

1076

def wait_for_results(self):

1077

"""

1078

Wait for results of all tests in all jobs in |self._jobs|.

1079

1080

Currently polls for results every 5s. When all results are available,

1081

@return a list of tuples, one per test: (status, subdir, name, reason)

1082

"""

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1083

while self._jobs:

1084

for job in list(self._jobs):

1085

if not self._afe.get_jobs(id=job.id, finished=True):

1086

continue

1087

1088

self._jobs.remove(job)

1089

1090

entries = self._afe.run('get_host_queue_entries', job=job.id)

1091

if reduce(self._collate_aborted, entries, False):

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

1092

yield Status('ABORT', job.name)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1093

else:

1094

statuses = self._tko.get_status_counts(job=job.id)

1095

for s in filter(self._status_is_relevant, statuses):

Chris Masone

2012-04-30 13:10:58 -0700

[diff] [blame]

1096

yield Status(s.status, s.test_name, s.reason,

1097

s.test_started_time,

1098

s.test_finished_time)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1099

time.sleep(5)

1100

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1101

Chris Masone

2012-01-17 11:16:32 -0800

[diff] [blame]

1102

@staticmethod

1103

def find_and_parse_tests(cf_getter, predicate, add_experimental=False):

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1104

"""

1105

Function to scan through all tests and find eligible tests.

1106

1107

Looks at control files returned by _cf_getter.get_control_file_list()

1108

for tests that pass self._predicate().

1109

1110

@param cf_getter: a control_file_getter.ControlFileGetter used to list

1111

and fetch the content of control files

1112

@param predicate: a function that should return True when run over a

1113

ControlData representation of a control file that should be in

1114

this Suite.

1115

@param add_experimental: add tests with experimental attribute set.

1116

1117

@return list of ControlData objects that should be run, with control

1118

file text added in |text| attribute.

1119

"""

1120

tests = {}

1121

files = cf_getter.get_control_file_list()

Chris Masone

75a2061

2012-05-08 12:37:31 -0700

[diff] [blame]

1122

matcher = re.compile(r'[^/]+/(deps|profilers)/.+')

1123

for file in filter(lambda f: not matcher.match(f), files):

Chris Masone

2012-05-08 14:07:13 -0700

[diff] [blame]

1124

logging.debug('Considering %s', file)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1125

text = cf_getter.get_control_file_contents(file)

1126

try:

Chris Masone

2012-05-08 14:07:13 -0700

[diff] [blame]

1127

found_test = control_data.parse_control_string(

1128

text, raise_warnings=True)

Chris Masone

2011-10-20 16:36:43 -0700

[diff] [blame]

1129

if not add_experimental and found_test.experimental:

1130

continue

1131

1132

found_test.text = text

Chris Masone

e8a4eff

2012-02-28 16:33:43 -0800

[diff] [blame]

1133

found_test.path = file

Chris Masone