Blame - docs/dyn/dataflow_v1b3.projects.templates.html - platform/external/python/google-api-python-client

<h1><a href="dataflow_v1b3.html">Google Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.templates.html">templates</a></h1>

76

<h2>Instance Methods</h2>

77

78

<code><a href="#create">create(projectId, body, x__xgafv=None)</a></code></p>

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

79

<p class="firstline">Creates a Cloud Dataflow job from a template.</p>

80

81

<code><a href="#get">get(projectId, gcsPath=None, x__xgafv=None, view=None)</a></code></p>

82

<p class="firstline">Get the template associated with a template.</p>

83

84

<code><a href="#launch">launch(projectId, body, dryRun=None, gcsPath=None, x__xgafv=None)</a></code></p>

85

<p class="firstline">Launch a template.</p>

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

86

<h3>Method Details</h3>

87

88

<code class="details" id="create">create(projectId, body, x__xgafv=None)</code>

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

89

<pre>Creates a Cloud Dataflow job from a template.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

90

91

Args:

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

92

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

93

body: object, The request body. (required)

94

The object takes the form of:

95

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

96

{ # A request to create a Cloud Dataflow job from a template.

97

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

98

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

99

# Use with caution.

100

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

101

# Must be a valid Cloud Storage URL, beginning with `gs://`.

102

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

103

"zone": "A String", # The Compute Engine [availability zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

104

# for launching worker instances to run your pipeline.

105

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

106

# available to your pipeline during execution, from 1 to 1000.

Jon Wayne Parrott

692617a

2017-01-06 09:58:29 -0800

[diff] [blame]

107

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

108

"gcsPath": "A String", # Required. A Cloud Storage path to the template from which to

109

# create the job.

110

# Must be a valid Cloud Storage URL, beginning with `gs://`.

111

"parameters": { # The runtime parameters to pass to the job.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

112

"a_key": "A String",

113

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

114

"jobName": "A String", # Required. The job name to use for the created job.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

115

}

116

117

x__xgafv: string, V1 error format.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

118

Allowed values

119

1 - v1 error format

120

2 - v2 error format

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

121

122

Returns:

123

An object of the form:

124

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

125

{ # Defines a job to be run by the Cloud Dataflow service.

126

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

127

# If this field is set, the service will ensure its uniqueness.

128

# The request to create a job will fail if the service has knowledge of a

129

# previously submitted job with the same client's ID and job name.

130

# The caller may use this field to ensure idempotence of job

131

# creation across retried attempts to create a job.

132

# By default, the field is empty and, in that case, the service ignores it.

133

"requestedState": "A String", # The job's requested state.

134

#

135

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

136

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

137

# also be used to directly set a job's requested state to

138

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

139

# job if it has not already reached a terminal state.

140

"name": "A String", # The user-specified Cloud Dataflow job name.

141

#

142

# Only one Job with a given name may exist in a project at any

143

# given time. If a caller attempts to create a Job with the same

144

# name as an already-existing Job, the attempt returns the

145

# existing Job.

146

#

147

# The name must match the regular expression

148

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

149

"currentStateTime": "A String", # The timestamp associated with the current state.

150

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

151

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

152

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

153

"labels": { # User-defined labels for this job.

154

#

155

# The labels map can contain no more than 64 entries. Entries of the labels

156

# map are UTF8 strings that comply with the following restrictions:

157

#

158

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

159

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

160

# * Both keys and values are additionally constrained to be <= 128 bytes in

161

# size.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

162

"a_key": "A String",

163

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

164

"location": "A String", # The location that contains this job.

165

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

166

# Cloud Dataflow service.

167

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

168

# corresponding name prefixes of the new job.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

169

"a_key": "A String",

170

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

171

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

172

"version": { # A structure describing which components and their versions of the service

173

# are required in order to run the job.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

174

"a_key": "", # Properties of the object.

175

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

176

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

177

# storage. The system will append the suffix "/temp-{JOBNAME} to

178

# this resource prefix, where {JOBNAME} is the value of the

179

# job_name field. The resulting bucket and object prefix is used

180

# as the prefix of the resources used to store temporary data

181

# needed during the job execution. NOTE: This will override the

182

# value in taskrunner_settings.

183

# The supported resource type is:

184

#

185

# Google Cloud Storage:

186

#

187

# storage.googleapis.com/{bucket}/{object}

188

# bucket.storage.googleapis.com/{object}

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

189

"internalExperiments": { # Experimental settings.

190

"a_key": "", # Properties of the object. Contains field @type with type URL.

191

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

192

"dataset": "A String", # The dataset for the current project where various workflow

193

# related tables are stored.

194

#

195

# The supported resource type is:

196

#

197

# Google BigQuery:

198

# bigquery.googleapis.com/{dataset}

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

199

"experiments": [ # The list of experiments to enable.

200

"A String",

201

],

Sai Cheemalapati

ea3a5e1

2016-10-12 14:05:53 -0700

[diff] [blame]

202

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

203

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

204

# options are passed through the service and are used to recreate the

205

# SDK pipeline options on the worker in a language agnostic and platform

206

# independent way.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

207

"a_key": "", # Properties of the object.

208

},

209

"userAgent": { # A description of the process that generated the request.

210

"a_key": "", # Properties of the object.

211

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

212

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

213

# unspecified, the service will attempt to choose a reasonable

214

# default. This should be in the form of the API service name,

215

# e.g. "compute.googleapis.com".

216

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

217

# specified in order for the job to have workers.

218

{ # Describes one particular pool of Cloud Dataflow workers to be

219

# instantiated by the Cloud Dataflow service in order to perform the

220

# computations required by a job. Note that a workflow job may use

221

# multiple pools, in order to match the various computational

222

# requirements of the various stages of the job.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

223

"diskSourceImage": "A String", # Fully qualified source image for disks.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

224

"ipConfiguration": "A String", # Configuration for VM IPs.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

225

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

226

# are supported.

227

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

228

# service will attempt to choose a reasonable default.

229

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

230

# the service will use the network "default".

231

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

232

# will attempt to choose a reasonable default.

233

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

234

# attempt to choose a reasonable default.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

235

"metadata": { # Metadata to set on the Google Compute Engine VMs.

236

"a_key": "A String",

237

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

238

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

239

# Compute Engine API.

240

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

241

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

242

# `TEARDOWN_NEVER`.

243

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

244

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

245

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

246

# down.

247

#

248

# If the workers are not torn down by the service, they will

249

# continue to run and use Google Compute Engine VM resources in the

250

# user's project until they are explicitly terminated by the user.

251

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

252

# policy except for small, manually supervised test jobs.

253

#

254

# If unknown or unspecified, the service will attempt to choose a reasonable

255

# default.

256

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

257

# service will choose a number of threads (according to the number of cores

258

# on the selected machine type for batch, or 1 by convention for streaming).

259

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

260

# the form "regions/REGION/subnetworks/SUBNETWORK".

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

261

"poolArgs": { # Extra arguments for this worker pool.

262

"a_key": "", # Properties of the object. Contains field @type with type URL.

263

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

264

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

265

# execute the job. If zero or unspecified, the service will

266

# attempt to choose a reasonable default.

267

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

268

# using the standard Dataflow task runner. Users should ignore

269

# this field.

270

"workflowFileName": "A String", # The file to store the workflow in.

271

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

272

# will not be uploaded.

273

#

274

# The supported resource type is:

275

#

276

# Google Cloud Storage:

277

# storage.googleapis.com/{bucket}/{object}

278

# bucket.storage.googleapis.com/{object}

279

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

280

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

281

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

282

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

283

"vmId": "A String", # The ID string of the VM.

284

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

285

# taskrunner; e.g. "wheel".

286

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

287

# taskrunner; e.g. "root".

288

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

289

# access the Cloud Dataflow API.

290

"A String",

291

],

292

"languageHint": "A String", # The suggested backend language.

293

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

294

# console.

295

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

296

"logDir": "A String", # The directory on the VM to store logs.

297

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

298

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

299

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

300

# "shuffle/v1beta1".

301

"workerId": "A String", # The ID of the worker running this pipeline.

302

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

303

#

304

# When workers access Google Cloud APIs, they logically do so via

305

# relative URLs. If this field is specified, it supplies the base

306

# URL to use for resolving these relative URLs. The normative

307

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

308

# Locators".

309

#

310

# If not specified, the default value is "http://www.googleapis.com/"

311

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

312

# "dataflow/v1b3/projects".

313

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

314

# storage.

315

#

316

# The supported resource type is:

317

#

318

# Google Cloud Storage:

319

#

320

# storage.googleapis.com/{bucket}/{object}

321

# bucket.storage.googleapis.com/{object}

322

},

323

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

324

"harnessCommand": "A String", # The command to launch the worker harness.

325

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

326

# temporary storage.

327

#

328

# The supported resource type is:

329

#

330

# Google Cloud Storage:

331

# storage.googleapis.com/{bucket}/{object}

332

# bucket.storage.googleapis.com/{object}

333

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

334

#

335

# When workers access Google Cloud APIs, they logically do so via

336

# relative URLs. If this field is specified, it supplies the base

337

# URL to use for resolving these relative URLs. The normative

338

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

339

# Locators".

340

#

341

# If not specified, the default value is "http://www.googleapis.com/"

342

},

343

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

344

# select a default set of packages which are useful to worker

345

# harnesses written in a particular language.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

346

"packages": [ # Packages to be installed on workers.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

347

{ # The packages that must be installed in order for a worker to run the

348

# steps of the Cloud Dataflow job that will be assigned to its worker

349

# pool.

350

#

351

# This is the mechanism by which the Cloud Dataflow SDK causes code to

352

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

353

# might use this to install jars containing the user's code and all of the

354

# various dependencies (libraries, data files, etc.) required in order

355

# for that code to run.

356

"location": "A String", # The resource to read the package from. The supported resource type is:

357

#

358

# Google Cloud Storage:

359

#

360

# storage.googleapis.com/{bucket}

361

# bucket.storage.googleapis.com/

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

362

"name": "A String", # The name of the package.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

363

},

364

],

365

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

366

"algorithm": "A String", # The algorithm to use for autoscaling.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

367

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

368

},

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

369

"dataDisks": [ # Data disks that are used by a VM in this workflow.

370

{ # Describes the data disk used by a workflow job.

371

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

372

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

373

# attempt to choose a reasonable default.

374

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

375

# must be a disk type appropriate to the project and zone in which

376

# the workers will run. If unknown or unspecified, the service

377

# will attempt to choose a reasonable default.

378

#

379

# For example, the standard persistent disk type is a resource name

380

# typically ending in "pd-standard". If SSD persistent disks are

381

# available, the resource name typically ends with "pd-ssd". The

382

# actual valid values are defined the Google Compute Engine API,

383

# not by the Cloud Dataflow API; consult the Google Compute Engine

384

# documentation for more information about determining the set of

385

# available disk types for a particular project and zone.

386

#

387

# Google Compute Engine Disk types are local to a particular

388

# project in a particular zone, and so the resource name will

389

# typically look something like this:

390

#

391

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

392

},

393

],

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

394

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

395

# attempt to choose a reasonable default.

396

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

397

# harness, residing in Google Container Registry.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

398

},

399

],

400

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

401

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

402

# A description of the user pipeline and stages through which it is executed.

403

# Created by Cloud Dataflow service. Only retrieved with

404

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

405

# form. This data is provided by the Dataflow service for ease of visualizing

406

# the pipeline and interpretting Dataflow provided metrics.

407

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

408

{ # Description of the type, names/ids, and input/outputs for a transform.

409

"kind": "A String", # Type of transform.

410

"name": "A String", # User provided name for this transform instance.

411

"inputCollectionName": [ # User names for all collection inputs to this transform.

412

"A String",

413

],

414

"displayData": [ # Transform-specific display data.

415

{ # Data provided with a pipeline or transform to provide descriptive info.

416

"key": "A String", # The key identifying the display data.

417

# This is intended to be used as a label for the display data

418

# when viewed in a dax monitoring system.

419

"shortStrValue": "A String", # A possible additional shorter value to display.

420

# For example a java_class_name_value of com.mypackage.MyDoFn

421

# will be stored with MyDoFn as the short_str_value and

422

# com.mypackage.MyDoFn as the java_class_name value.

423

# short_str_value can be displayed and java_class_name_value

424

# will be displayed as a tooltip.

425

"timestampValue": "A String", # Contains value if the data is of timestamp type.

426

"url": "A String", # An optional full URL.

427

"floatValue": 3.14, # Contains value if the data is of float type.

428

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

429

# language namespace (i.e. python module) which defines the display data.

430

# This allows a dax monitoring system to specially handle the data

431

# and perform custom rendering.

432

"javaClassValue": "A String", # Contains value if the data is of java class type.

433

"label": "A String", # An optional label to display in a dax UI for the element.

434

"boolValue": True or False, # Contains value if the data is of a boolean type.

435

"strValue": "A String", # Contains value if the data is of string type.

436

"durationValue": "A String", # Contains value if the data is of duration type.

437

"int64Value": "A String", # Contains value if the data is of int64 type.

438

},

439

],

440

"outputCollectionName": [ # User names for all collection outputs to this transform.

441

"A String",

442

],

443

"id": "A String", # SDK generated id of this transform instance.

444

},

445

],

446

"displayData": [ # Pipeline level display data.

447

{ # Data provided with a pipeline or transform to provide descriptive info.

448

"key": "A String", # The key identifying the display data.

449

# This is intended to be used as a label for the display data

450

# when viewed in a dax monitoring system.

451

"shortStrValue": "A String", # A possible additional shorter value to display.

452

# For example a java_class_name_value of com.mypackage.MyDoFn

453

# will be stored with MyDoFn as the short_str_value and

454

# com.mypackage.MyDoFn as the java_class_name value.

455

# short_str_value can be displayed and java_class_name_value

456

# will be displayed as a tooltip.

457

"timestampValue": "A String", # Contains value if the data is of timestamp type.

458

"url": "A String", # An optional full URL.

459

"floatValue": 3.14, # Contains value if the data is of float type.

460

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

461

# language namespace (i.e. python module) which defines the display data.

462

# This allows a dax monitoring system to specially handle the data

463

# and perform custom rendering.

464

"javaClassValue": "A String", # Contains value if the data is of java class type.

465

"label": "A String", # An optional label to display in a dax UI for the element.

466

"boolValue": True or False, # Contains value if the data is of a boolean type.

467

"strValue": "A String", # Contains value if the data is of string type.

468

"durationValue": "A String", # Contains value if the data is of duration type.

469

"int64Value": "A String", # Contains value if the data is of int64 type.

470

},

471

],

472

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

473

{ # Description of the composing transforms, names/ids, and input/outputs of a

474

# stage of execution. Some composing transforms and sources may have been

475

# generated by the Dataflow service during execution planning.

476

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

477

{ # Description of an interstitial value between transforms in an execution

478

# stage.

479

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

480

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

481

# source is most closely associated.

482

"name": "A String", # Dataflow service generated name for this source.

483

},

484

],

485

"kind": "A String", # Type of tranform this stage is executing.

486

"name": "A String", # Dataflow service generated name for this stage.

487

"outputSource": [ # Output sources for this stage.

488

{ # Description of an input or output of an execution stage.

489

"userName": "A String", # Human-readable name for this source; may be user or system generated.

490

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

491

# source is most closely associated.

492

"name": "A String", # Dataflow service generated name for this source.

493

"sizeBytes": "A String", # Size of the source, if measurable.

494

},

495

],

496

"inputSource": [ # Input sources for this stage.

497

{ # Description of an input or output of an execution stage.

498

"userName": "A String", # Human-readable name for this source; may be user or system generated.

499

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

500

# source is most closely associated.

501

"name": "A String", # Dataflow service generated name for this source.

502

"sizeBytes": "A String", # Size of the source, if measurable.

503

},

504

],

505

"componentTransform": [ # Transforms that comprise this execution stage.

506

{ # Description of a transform executed as part of an execution stage.

507

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

508

"originalTransform": "A String", # User name for the original user transform with which this transform is

509

# most closely associated.

510

"name": "A String", # Dataflow service generated name for this source.

511

},

512

],

513

"id": "A String", # Dataflow service generated id for this stage.

514

},

515

],

516

},

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

517

"steps": [ # The top-level steps that constitute the entire job.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

518

{ # Defines a particular step within a Cloud Dataflow job.

519

#

520

# A job consists of multiple steps, each of which performs some

521

# specific operation as part of the overall job. Data is typically

522

# passed from one step to another as part of the job.

523

#

524

# Here's an example of a sequence of steps which together implement a

525

# Map-Reduce job:

526

#

527

# * Read a collection of data from some source, parsing the

528

# collection's elements.

529

#

530

# * Validate the elements.

531

#

532

# * Apply a user-defined function to map each element to some value

533

# and extract an element-specific key value.

534

#

535

# * Group elements with the same key into a single element with

536

# that key, transforming a multiply-keyed collection into a

537

# uniquely-keyed collection.

538

#

539

# * Write the elements out to some data sink.

540

#

541

# Note that the Cloud Dataflow service may be used to run many different

542

# types of jobs, not just Map-Reduce.

543

"kind": "A String", # The kind of step in the Cloud Dataflow job.

544

"properties": { # Named properties associated with the step. Each kind of

545

# predefined step has its own required set of properties.

546

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

547

"a_key": "", # Properties of the object.

548

},

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

549

"name": "A String", # The name that identifies the step. This must be unique for each

550

# step with respect to all other steps in the Cloud Dataflow job.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

551

},

552

],

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

553

"currentState": "A String", # The current state of the job.

554

#

555

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

556

# specified.

557

#

558

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

559

# terminal state. After a job has reached a terminal state, no

560

# further state updates may be made.

561

#

562

# This field may be mutated by the Cloud Dataflow service;

563

# callers cannot mutate it.

564

"tempFiles": [ # A set of files the system should be aware of that are used

565

# for temporary storage. These temporary files will be

566

# removed on job completion.

567

# No duplicates are allowed.

568

# No file patterns are supported.

569

#

570

# The supported files are:

571

#

572

# Google Cloud Storage:

573

#

574

# storage.googleapis.com/{bucket}/{object}

575

# bucket.storage.googleapis.com/{object}

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

576

"A String",

577

],

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

578

"type": "A String", # The type of Cloud Dataflow job.

579

"id": "A String", # The unique ID of this job.

580

#

581

# This field is set by the Cloud Dataflow service when the Job is

582

# created, and is immutable for the life of the job.

583

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

584

# of the job it replaced.

585

#

586

# When sending a `CreateJobRequest`, you can update a job by specifying it

587

# here. The job named here is stopped, and its intermediate state is

588

# transferred to this job.

589

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

590

# isn't contained in the submitted job.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

591

"stages": { # A mapping from each stage to the information about that stage.

Sai Cheemalapati

c30d2b5

2017-03-13 12:12:03 -0400

[diff] [blame^]

592

"a_key": { # Contains information about how a particular

593

# google.dataflow.v1beta3.Step will be executed.

594

"stepName": [ # The steps associated with the execution stage.

595

# Note that stages may have several steps, and that a given step

596

# might be run by more than one stage.

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

"A String",

],

},

},

},

}</pre>

</div>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame^]

605

606

<code class="details" id="get">get(projectId, gcsPath=None, x__xgafv=None, view=None)</code>

607

<pre>Get the template associated with a template.

608

609

Args:

610

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

611

gcsPath: string, Required. A Cloud Storage path to the template from which to

612

create the job.

613

Must be a valid Cloud Storage URL, beginning with `gs://`.

614

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

view: string, The view to retrieve. Defaults to METADATA_ONLY.

619

620

Returns:

621

An object of the form:

622

623

{ # The response to a GetTemplate request.

624

"status": { # The `Status` type defines a logical error model that is suitable for different # The status of the get template request. Any problems with the

625

# request will be indicated in the error_details.

626

# programming environments, including REST APIs and RPC APIs. It is used by

627

# [gRPC](https://github.com/grpc). The error model is designed to be:

628

#

629

# - Simple to use and understand for most users

630

# - Flexible enough to meet unexpected needs

#

# # Overview

#

# The `Status` message contains three pieces of data: error code, error message,

635

# and error details. The error code should be an enum value of

636

# google.rpc.Code, but it may accept additional error codes if needed. The

637

# error message should be a developer-facing English message that helps

638

# developers *understand* and *resolve* the error. If a localized user-facing

639

# error message is needed, put the localized message in the error details or

640

# localize it in the client. The optional error details may contain arbitrary

641

# information about the error. There is a predefined set of error detail types

642

# in the package `google.rpc` which can be used for common error conditions.

#

# # Language mapping

#

# The `Status` message is the logical representation of the error model, but it

647

# is not necessarily the actual wire format. When the `Status` message is

648

# exposed in different client libraries and different wire protocols, it can be

649

# mapped differently. For example, it will likely be mapped to some exceptions

650

# in Java, but more likely mapped to some error codes in C.

#

# # Other uses

#

# The error model and the `Status` message can be used in a variety of

655

# environments, either with or without APIs, to provide a

656

# consistent developer experience across different environments.

657

#

658

# Example uses of this error model include:

659

#

660

# - Partial errors. If a service needs to return partial errors to the client,

661

# it may embed the `Status` in the normal response to indicate the partial

662

# errors.

663

#

664

# - Workflow errors. A typical workflow has multiple steps. Each step may

665

# have a `Status` message for error reporting purpose.

666

#

667

# - Batch operations. If a client uses batch request and batch response, the

668

# `Status` message should be used directly inside batch response, one for

669

# each error sub-response.

670

#

671

# - Asynchronous operations. If an API call embeds asynchronous operation

672

# results in its response, the status of those operations should be

673

# represented directly using the `Status` message.

674

#

675

# - Logging. If some API errors are stored in logs, the message `Status` could

676

# be used directly after any stripping needed for security/privacy reasons.

677

"message": "A String", # A developer-facing error message, which should be in English. Any

678

# user-facing error message should be localized and sent in the

679

# google.rpc.Status.details field, or localized by the client.

680

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

681

"details": [ # A list of messages that carry the error details. There will be a

682

# common set of message types for APIs to use.

683

{

684

"a_key": "", # Properties of the object. Contains field @type with type URL.

},

],

},

"metadata": { # Metadata describing a template. # The template metadata describing the template name, available

689

# parameters, etc.

690

"bypassTempDirValidation": True or False, # If true, will bypass the validation that the temp directory is

691

# writable. This should only be used with templates for pipelines

692

# that are guaranteed not to need to write to the temp directory,

693

# which is subject to change based on the optimizer.

694

"name": "A String", # Required. The name of the template.

695

"parameters": [ # The parameters for the template.

696

{ # Metadata for a specific parameter.

697

"regexes": [ # Optional. Regexes that the parameter must match.

698

"A String",

699

],

700

"helpText": "A String", # Required. The help text to display for the parameter.

701

"name": "A String", # Required. The name of the parameter.

702

"isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false.

703

"label": "A String", # Required. The label to display for the parameter.

704

},

705

],

706

"description": "A String", # Optional. A description of the template.

},

}</pre>

</div>

<code class="details" id="launch">launch(projectId, body, dryRun=None, gcsPath=None, x__xgafv=None)</code>

713

<pre>Launch a template.

714

715

Args:

716

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

717

body: object, The request body. (required)

718

The object takes the form of:

719

720

{ # Parameters to provide to the template being launched.

721

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

722

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

723

# Use with caution.

724

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

725

# Must be a valid Cloud Storage URL, beginning with `gs://`.

726

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

727

"zone": "A String", # The Compute Engine [availability zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

728

# for launching worker instances to run your pipeline.

729

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

730

# available to your pipeline during execution, from 1 to 1000.

731

},

732

"parameters": { # The runtime parameters to pass to the job.

733

"a_key": "A String",

734

},

735

"jobName": "A String", # Required. The job name to use for the created job.

736

}

737

738

dryRun: boolean, Whether or not the job should actually be executed after

739

validating parameters. Defaults to false. Validation errors do

740

not cause the HTTP request to fail if true.

741

gcsPath: string, Required. A Cloud Storage path to the template from which to create

742

the job.

743

Must be valid Cloud Storage URL, beginning with 'gs://'.

744

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

751

752

{ # Response to the request to launch a template.

753

"status": { # The `Status` type defines a logical error model that is suitable for different # The status of the launch template request. Any problems with the request

754

# will be indicated in the error_details.

755

# programming environments, including REST APIs and RPC APIs. It is used by

756

# [gRPC](https://github.com/grpc). The error model is designed to be:

757

#

758

# - Simple to use and understand for most users

759

# - Flexible enough to meet unexpected needs

#

# # Overview

#

# The `Status` message contains three pieces of data: error code, error message,

764

# and error details. The error code should be an enum value of

765

# google.rpc.Code, but it may accept additional error codes if needed. The

766

# error message should be a developer-facing English message that helps

767

# developers *understand* and *resolve* the error. If a localized user-facing

768

# error message is needed, put the localized message in the error details or

769

# localize it in the client. The optional error details may contain arbitrary

770

# information about the error. There is a predefined set of error detail types

771

# in the package `google.rpc` which can be used for common error conditions.

#

# # Language mapping

#

# The `Status` message is the logical representation of the error model, but it

776

# is not necessarily the actual wire format. When the `Status` message is

777

# exposed in different client libraries and different wire protocols, it can be

778

# mapped differently. For example, it will likely be mapped to some exceptions

779

# in Java, but more likely mapped to some error codes in C.

#

# # Other uses

#

# The error model and the `Status` message can be used in a variety of

784

# environments, either with or without APIs, to provide a

785

# consistent developer experience across different environments.

786

#

787

# Example uses of this error model include:

788

#

789

# - Partial errors. If a service needs to return partial errors to the client,

790

# it may embed the `Status` in the normal response to indicate the partial

791

# errors.

792

#

793

# - Workflow errors. A typical workflow has multiple steps. Each step may

794

# have a `Status` message for error reporting purpose.

795

#

796

# - Batch operations. If a client uses batch request and batch response, the

797

# `Status` message should be used directly inside batch response, one for

798

# each error sub-response.

799

#

800

# - Asynchronous operations. If an API call embeds asynchronous operation

801

# results in its response, the status of those operations should be

802

# represented directly using the `Status` message.

803

#

804

# - Logging. If some API errors are stored in logs, the message `Status` could

805

# be used directly after any stripping needed for security/privacy reasons.

806

"message": "A String", # A developer-facing error message, which should be in English. Any

807

# user-facing error message should be localized and sent in the

808

# google.rpc.Status.details field, or localized by the client.

809

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

810

"details": [ # A list of messages that carry the error details. There will be a

811

# common set of message types for APIs to use.

812

{

813

"a_key": "", # Properties of the object. Contains field @type with type URL.

},

],

},

"job": { # Defines a job to be run by the Cloud Dataflow service. # The job that was launched, if the request was not a dry run and

818

# the job was successfully launched.

819

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

820

# If this field is set, the service will ensure its uniqueness.

821

# The request to create a job will fail if the service has knowledge of a

822

# previously submitted job with the same client's ID and job name.

823

# The caller may use this field to ensure idempotence of job

824

# creation across retried attempts to create a job.

825

# By default, the field is empty and, in that case, the service ignores it.

826

"requestedState": "A String", # The job's requested state.

827

#

828

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

829

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

830

# also be used to directly set a job's requested state to

831

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

832

# job if it has not already reached a terminal state.

833

"name": "A String", # The user-specified Cloud Dataflow job name.

834

#

835

# Only one Job with a given name may exist in a project at any

836

# given time. If a caller attempts to create a Job with the same

837

# name as an already-existing Job, the attempt returns the

838

# existing Job.

839

#

840

# The name must match the regular expression

841

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

842

"currentStateTime": "A String", # The timestamp associated with the current state.

843

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

844

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

845

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

846

"labels": { # User-defined labels for this job.

847

#

848

# The labels map can contain no more than 64 entries. Entries of the labels

849

# map are UTF8 strings that comply with the following restrictions:

850

#

851

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

852

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

853

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"location": "A String", # The location that contains this job.

858

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

859

# Cloud Dataflow service.

860

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

861

# corresponding name prefixes of the new job.

862

"a_key": "A String",

863

},

864

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

865

"version": { # A structure describing which components and their versions of the service

866

# are required in order to run the job.

867

"a_key": "", # Properties of the object.

868

},

869

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

870

# storage. The system will append the suffix "/temp-{JOBNAME} to

871

# this resource prefix, where {JOBNAME} is the value of the

872

# job_name field. The resulting bucket and object prefix is used

873

# as the prefix of the resources used to store temporary data

874

# needed during the job execution. NOTE: This will override the

875

# value in taskrunner_settings.

876

# The supported resource type is:

877

#

878

# Google Cloud Storage:

879

#

880

# storage.googleapis.com/{bucket}/{object}

881

# bucket.storage.googleapis.com/{object}

882

"internalExperiments": { # Experimental settings.

883

"a_key": "", # Properties of the object. Contains field @type with type URL.

884

},

885

"dataset": "A String", # The dataset for the current project where various workflow

886

# related tables are stored.

887

#

888

# The supported resource type is:

889

#

890

# Google BigQuery:

891

# bigquery.googleapis.com/{dataset}

892

"experiments": [ # The list of experiments to enable.

893

"A String",

894

],

895

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

896

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

897

# options are passed through the service and are used to recreate the

898

# SDK pipeline options on the worker in a language agnostic and platform

899

# independent way.

900

"a_key": "", # Properties of the object.

901

},

902

"userAgent": { # A description of the process that generated the request.

903

"a_key": "", # Properties of the object.

904

},

905

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

906

# unspecified, the service will attempt to choose a reasonable

907

# default. This should be in the form of the API service name,

908

# e.g. "compute.googleapis.com".

909

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

910

# specified in order for the job to have workers.

911

{ # Describes one particular pool of Cloud Dataflow workers to be

912

# instantiated by the Cloud Dataflow service in order to perform the

913

# computations required by a job. Note that a workflow job may use

914

# multiple pools, in order to match the various computational

915

# requirements of the various stages of the job.

916

"diskSourceImage": "A String", # Fully qualified source image for disks.

917

"ipConfiguration": "A String", # Configuration for VM IPs.

918

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

919

# are supported.

920

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

921

# service will attempt to choose a reasonable default.

922

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

923

# the service will use the network "default".

924

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

925

# will attempt to choose a reasonable default.

926

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

927

# attempt to choose a reasonable default.

928

"metadata": { # Metadata to set on the Google Compute Engine VMs.

929

"a_key": "A String",

930

},

931

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

932

# Compute Engine API.

933

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

934

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

935

# `TEARDOWN_NEVER`.

936

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

937

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

938

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

939

# down.

940

#

941

# If the workers are not torn down by the service, they will

942

# continue to run and use Google Compute Engine VM resources in the

943

# user's project until they are explicitly terminated by the user.

944

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

945

# policy except for small, manually supervised test jobs.

946

#

947

# If unknown or unspecified, the service will attempt to choose a reasonable

948

# default.

949

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

950

# service will choose a number of threads (according to the number of cores

951

# on the selected machine type for batch, or 1 by convention for streaming).

952

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

953

# the form "regions/REGION/subnetworks/SUBNETWORK".

954

"poolArgs": { # Extra arguments for this worker pool.

955

"a_key": "", # Properties of the object. Contains field @type with type URL.

956

},

957

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

958

# execute the job. If zero or unspecified, the service will

959

# attempt to choose a reasonable default.

960

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

961

# using the standard Dataflow task runner. Users should ignore

962

# this field.

963

"workflowFileName": "A String", # The file to store the workflow in.

964

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

965

# will not be uploaded.

966

#

967

# The supported resource type is:

968

#

969

# Google Cloud Storage:

970

# storage.googleapis.com/{bucket}/{object}

971

# bucket.storage.googleapis.com/{object}

972

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

973

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

974

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

975

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

976

"vmId": "A String", # The ID string of the VM.

977

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

978

# taskrunner; e.g. "wheel".

979

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

980

# taskrunner; e.g. "root".

981

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

982

# access the Cloud Dataflow API.

983

"A String",

984

],

985

"languageHint": "A String", # The suggested backend language.

986

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

987

# console.

988

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

989

"logDir": "A String", # The directory on the VM to store logs.

990

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

991

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

992

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

993

# "shuffle/v1beta1".

994

"workerId": "A String", # The ID of the worker running this pipeline.

995

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

996

#

997

# When workers access Google Cloud APIs, they logically do so via

998

# relative URLs. If this field is specified, it supplies the base

999

# URL to use for resolving these relative URLs. The normative

1000

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1001

# Locators".

1002

#

1003

# If not specified, the default value is "http://www.googleapis.com/"

1004

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1005

# "dataflow/v1b3/projects".

1006

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1007

# storage.

1008

#

1009

# The supported resource type is:

1010

#

1011

# Google Cloud Storage:

1012

#

1013

# storage.googleapis.com/{bucket}/{object}

1014

# bucket.storage.googleapis.com/{object}

1015

},

1016

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1017

"harnessCommand": "A String", # The command to launch the worker harness.

1018

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1019

# temporary storage.

1020

#

1021

# The supported resource type is:

1022

#

1023

# Google Cloud Storage:

1024

# storage.googleapis.com/{bucket}/{object}

1025

# bucket.storage.googleapis.com/{object}

1026

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1027

#

1028

# When workers access Google Cloud APIs, they logically do so via

1029

# relative URLs. If this field is specified, it supplies the base

1030

# URL to use for resolving these relative URLs. The normative

1031

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1032

# Locators".

1033

#

1034

# If not specified, the default value is "http://www.googleapis.com/"

1035

},

1036

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1037

# select a default set of packages which are useful to worker

1038

# harnesses written in a particular language.

1039

"packages": [ # Packages to be installed on workers.

1040

{ # The packages that must be installed in order for a worker to run the

1041

# steps of the Cloud Dataflow job that will be assigned to its worker

1042

# pool.

1043

#

1044

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1045

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1046

# might use this to install jars containing the user's code and all of the

1047

# various dependencies (libraries, data files, etc.) required in order

1048

# for that code to run.

1049

"location": "A String", # The resource to read the package from. The supported resource type is:

1050

#

1051

# Google Cloud Storage:

1052

#

1053

# storage.googleapis.com/{bucket}

1054

# bucket.storage.googleapis.com/

1055

"name": "A String", # The name of the package.

1056

},

1057

],

1058

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1059

"algorithm": "A String", # The algorithm to use for autoscaling.

1060

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1061

},

1062

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1063

{ # Describes the data disk used by a workflow job.

1064

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1065

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1066

# attempt to choose a reasonable default.

1067

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1068

# must be a disk type appropriate to the project and zone in which

1069

# the workers will run. If unknown or unspecified, the service

1070

# will attempt to choose a reasonable default.

1071

#

1072

# For example, the standard persistent disk type is a resource name

1073

# typically ending in "pd-standard". If SSD persistent disks are

1074

# available, the resource name typically ends with "pd-ssd". The

1075

# actual valid values are defined the Google Compute Engine API,

1076

# not by the Cloud Dataflow API; consult the Google Compute Engine

1077

# documentation for more information about determining the set of

1078

# available disk types for a particular project and zone.

1079

#

1080

# Google Compute Engine Disk types are local to a particular

1081

# project in a particular zone, and so the resource name will

1082

# typically look something like this:

1083

#

1084

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1085

},

1086

],

1087

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1088

# attempt to choose a reasonable default.

1089

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1090

# harness, residing in Google Container Registry.

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1095

# A description of the user pipeline and stages through which it is executed.

1096

# Created by Cloud Dataflow service. Only retrieved with

1097

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1098

# form. This data is provided by the Dataflow service for ease of visualizing

1099

# the pipeline and interpretting Dataflow provided metrics.

1100

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1101

{ # Description of the type, names/ids, and input/outputs for a transform.

1102

"kind": "A String", # Type of transform.

1103

"name": "A String", # User provided name for this transform instance.

1104

"inputCollectionName": [ # User names for all collection inputs to this transform.

1105

"A String",

1106

],

1107

"displayData": [ # Transform-specific display data.

1108

{ # Data provided with a pipeline or transform to provide descriptive info.

1109

"key": "A String", # The key identifying the display data.

1110

# This is intended to be used as a label for the display data

1111

# when viewed in a dax monitoring system.

1112

"shortStrValue": "A String", # A possible additional shorter value to display.

1113

# For example a java_class_name_value of com.mypackage.MyDoFn

1114

# will be stored with MyDoFn as the short_str_value and

1115

# com.mypackage.MyDoFn as the java_class_name value.

1116

# short_str_value can be displayed and java_class_name_value

1117

# will be displayed as a tooltip.

1118

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1119

"url": "A String", # An optional full URL.

1120

"floatValue": 3.14, # Contains value if the data is of float type.

1121

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1122

# language namespace (i.e. python module) which defines the display data.

1123

# This allows a dax monitoring system to specially handle the data

1124

# and perform custom rendering.

1125

"javaClassValue": "A String", # Contains value if the data is of java class type.

1126

"label": "A String", # An optional label to display in a dax UI for the element.

1127

"boolValue": True or False, # Contains value if the data is of a boolean type.

1128

"strValue": "A String", # Contains value if the data is of string type.

1129

"durationValue": "A String", # Contains value if the data is of duration type.

1130

"int64Value": "A String", # Contains value if the data is of int64 type.

1131

},

1132

],

1133

"outputCollectionName": [ # User names for all collection outputs to this transform.

1134

"A String",

1135

],

1136

"id": "A String", # SDK generated id of this transform instance.

1137

},

1138

],

1139

"displayData": [ # Pipeline level display data.

1140

{ # Data provided with a pipeline or transform to provide descriptive info.

1141

"key": "A String", # The key identifying the display data.

1142

# This is intended to be used as a label for the display data

1143

# when viewed in a dax monitoring system.

1144

"shortStrValue": "A String", # A possible additional shorter value to display.

1145

# For example a java_class_name_value of com.mypackage.MyDoFn

1146

# will be stored with MyDoFn as the short_str_value and

1147

# com.mypackage.MyDoFn as the java_class_name value.

1148

# short_str_value can be displayed and java_class_name_value

1149

# will be displayed as a tooltip.

1150

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1151

"url": "A String", # An optional full URL.

1152

"floatValue": 3.14, # Contains value if the data is of float type.

1153

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1154

# language namespace (i.e. python module) which defines the display data.

1155

# This allows a dax monitoring system to specially handle the data

1156

# and perform custom rendering.

1157

"javaClassValue": "A String", # Contains value if the data is of java class type.

1158

"label": "A String", # An optional label to display in a dax UI for the element.

1159

"boolValue": True or False, # Contains value if the data is of a boolean type.

1160

"strValue": "A String", # Contains value if the data is of string type.

1161

"durationValue": "A String", # Contains value if the data is of duration type.

1162

"int64Value": "A String", # Contains value if the data is of int64 type.

1163

},

1164

],

1165

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1166

{ # Description of the composing transforms, names/ids, and input/outputs of a

1167

# stage of execution. Some composing transforms and sources may have been

1168

# generated by the Dataflow service during execution planning.

1169

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1170

{ # Description of an interstitial value between transforms in an execution

1171

# stage.

1172

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1173

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1174

# source is most closely associated.

1175

"name": "A String", # Dataflow service generated name for this source.

1176

},

1177

],

1178

"kind": "A String", # Type of tranform this stage is executing.

1179

"name": "A String", # Dataflow service generated name for this stage.

1180

"outputSource": [ # Output sources for this stage.

1181

{ # Description of an input or output of an execution stage.

1182

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1183

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1184

# source is most closely associated.

1185

"name": "A String", # Dataflow service generated name for this source.

1186

"sizeBytes": "A String", # Size of the source, if measurable.

1187

},

1188

],

1189

"inputSource": [ # Input sources for this stage.

1190

{ # Description of an input or output of an execution stage.

1191

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1192

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1193

# source is most closely associated.

1194

"name": "A String", # Dataflow service generated name for this source.

1195

"sizeBytes": "A String", # Size of the source, if measurable.

1196

},

1197

],

1198

"componentTransform": [ # Transforms that comprise this execution stage.

1199

{ # Description of a transform executed as part of an execution stage.

1200

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1201

"originalTransform": "A String", # User name for the original user transform with which this transform is

1202

# most closely associated.

1203

"name": "A String", # Dataflow service generated name for this source.

1204

},

1205

],

1206

"id": "A String", # Dataflow service generated id for this stage.

},

],

},

"steps": [ # The top-level steps that constitute the entire job.

1211

{ # Defines a particular step within a Cloud Dataflow job.

1212

#

1213

# A job consists of multiple steps, each of which performs some

1214

# specific operation as part of the overall job. Data is typically

1215

# passed from one step to another as part of the job.

1216

#

1217

# Here's an example of a sequence of steps which together implement a

1218

# Map-Reduce job:

1219

#

1220

# * Read a collection of data from some source, parsing the

1221

# collection's elements.

1222

#

1223

# * Validate the elements.

1224

#

1225

# * Apply a user-defined function to map each element to some value

1226

# and extract an element-specific key value.

1227

#

1228

# * Group elements with the same key into a single element with

1229

# that key, transforming a multiply-keyed collection into a

1230

# uniquely-keyed collection.

1231

#

1232

# * Write the elements out to some data sink.

1233

#

1234

# Note that the Cloud Dataflow service may be used to run many different

1235

# types of jobs, not just Map-Reduce.

1236

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1237

"properties": { # Named properties associated with the step. Each kind of

1238

# predefined step has its own required set of properties.

1239

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1240

"a_key": "", # Properties of the object.

1241

},

1242

"name": "A String", # The name that identifies the step. This must be unique for each

1243

# step with respect to all other steps in the Cloud Dataflow job.

1244

},

1245

],

1246

"currentState": "A String", # The current state of the job.

1247

#

1248

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1249

# specified.

1250

#

1251

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1252

# terminal state. After a job has reached a terminal state, no

1253

# further state updates may be made.

1254

#

1255

# This field may be mutated by the Cloud Dataflow service;

1256

# callers cannot mutate it.

1257

"tempFiles": [ # A set of files the system should be aware of that are used

1258

# for temporary storage. These temporary files will be

1259

# removed on job completion.

1260

# No duplicates are allowed.

1261

# No file patterns are supported.

1262

#

1263

# The supported files are:

1264

#

1265

# Google Cloud Storage:

1266

#

1267

# storage.googleapis.com/{bucket}/{object}

1268

# bucket.storage.googleapis.com/{object}

1269

"A String",

1270

],

1271

"type": "A String", # The type of Cloud Dataflow job.

1272

"id": "A String", # The unique ID of this job.

1273

#

1274

# This field is set by the Cloud Dataflow service when the Job is

1275

# created, and is immutable for the life of the job.

1276

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1277

# of the job it replaced.

1278

#

1279

# When sending a `CreateJobRequest`, you can update a job by specifying it

1280

# here. The job named here is stopped, and its intermediate state is

1281

# transferred to this job.

1282

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1283

# isn't contained in the submitted job.

1284

"stages": { # A mapping from each stage to the information about that stage.

1285

"a_key": { # Contains information about how a particular

1286

# google.dataflow.v1beta3.Step will be executed.

1287

"stepName": [ # The steps associated with the execution stage.

1288

# Note that stages may have several steps, and that a given step

1289

# might be run by more than one stage.

"A String",

],

},

},

},

},

}</pre>

</div>

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

1299

</body></html>