Blame - docs/dyn/dataflow_v1b3.projects.templates.html - platform/external/python/google-api-python-client

2017-03-13 12:12:03 -0400

[diff] [blame]

79

<p class="firstline">Creates a Cloud Dataflow job from a template.</p>

80

81

<code><a href="#get">get(projectId, gcsPath=None, x__xgafv=None, view=None)</a></code></p>

82

<p class="firstline">Get the template associated with a template.</p>

83

84

<code><a href="#launch">launch(projectId, body, dryRun=None, gcsPath=None, x__xgafv=None)</a></code></p>

85

<p class="firstline">Launch a template.</p>

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

86

<h3>Method Details</h3>

87

88

<code class="details" id="create">create(projectId, body, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

89

<pre>Creates a Cloud Dataflow job from a template.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

90

91

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

92

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

93

body: object, The request body. (required)

94

The object takes the form of:

95

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

96

{ # A request to create a Cloud Dataflow job from a template.

97

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

98

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

99

# available to your pipeline during execution, from 1 to 1000.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

100

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

101

# Must be a valid Cloud Storage URL, beginning with `gs://`.

102

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

103

"zone": "A String", # The Compute Engine [availability zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

104

# for launching worker instances to run your pipeline.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

105

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

106

# Use with caution.

Jon Wayne Parrott

692617a

2017-01-06 09:58:29 -0800

[diff] [blame]

107

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

108

"gcsPath": "A String", # Required. A Cloud Storage path to the template from which to

109

# create the job.

110

# Must be a valid Cloud Storage URL, beginning with `gs://`.

111

"parameters": { # The runtime parameters to pass to the job.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

112

"a_key": "A String",

113

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

114

"jobName": "A String", # Required. The job name to use for the created job.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

115

}

116

117

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

118

Allowed values

119

1 - v1 error format

120

2 - v2 error format

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

121

122

Returns:

123

An object of the form:

124

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

125

{ # Defines a job to be run by the Cloud Dataflow service.

126

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

127

# If this field is set, the service will ensure its uniqueness.

128

# The request to create a job will fail if the service has knowledge of a

129

# previously submitted job with the same client's ID and job name.

130

# The caller may use this field to ensure idempotence of job

131

# creation across retried attempts to create a job.

132

# By default, the field is empty and, in that case, the service ignores it.

133

"requestedState": "A String", # The job's requested state.

134

#

135

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

136

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

137

# also be used to directly set a job's requested state to

138

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

139

# job if it has not already reached a terminal state.

140

"name": "A String", # The user-specified Cloud Dataflow job name.

141

#

142

# Only one Job with a given name may exist in a project at any

143

# given time. If a caller attempts to create a Job with the same

144

# name as an already-existing Job, the attempt returns the

145

# existing Job.

146

#

147

# The name must match the regular expression

148

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

149

"currentStateTime": "A String", # The timestamp associated with the current state.

150

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

151

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

152

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

153

"labels": { # User-defined labels for this job.

154

#

155

# The labels map can contain no more than 64 entries. Entries of the labels

156

# map are UTF8 strings that comply with the following restrictions:

157

#

158

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

159

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

160

# * Both keys and values are additionally constrained to be <= 128 bytes in

161

# size.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

162

"a_key": "A String",

163

},

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

164

"location": "A String", # The location that contains this job.

165

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

166

# Cloud Dataflow service.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

167

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

168

# corresponding name prefixes of the new job.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

169

"a_key": "A String",

170

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

171

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

172

"version": { # A structure describing which components and their versions of the service

173

# are required in order to run the job.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

174

"a_key": "", # Properties of the object.

175

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

176

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

177

# storage. The system will append the suffix "/temp-{JOBNAME} to

178

# this resource prefix, where {JOBNAME} is the value of the

179

# job_name field. The resulting bucket and object prefix is used

180

# as the prefix of the resources used to store temporary data

181

# needed during the job execution. NOTE: This will override the

182

# value in taskrunner_settings.

183

# The supported resource type is:

184

#

185

# Google Cloud Storage:

186

#

187

# storage.googleapis.com/{bucket}/{object}

188

# bucket.storage.googleapis.com/{object}

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

189

"internalExperiments": { # Experimental settings.

190

"a_key": "", # Properties of the object. Contains field @type with type URL.

191

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

192

"dataset": "A String", # The dataset for the current project where various workflow

193

# related tables are stored.

194

#

195

# The supported resource type is:

196

#

197

# Google BigQuery:

198

# bigquery.googleapis.com/{dataset}

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

199

"experiments": [ # The list of experiments to enable.

200

"A String",

201

],

Sai Cheemalapati

ea3a5e1

2016-10-12 14:05:53 -0700

[diff] [blame]

202

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

203

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

204

# options are passed through the service and are used to recreate the

205

# SDK pipeline options on the worker in a language agnostic and platform

206

# independent way.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

207

"a_key": "", # Properties of the object.

208

},

209

"userAgent": { # A description of the process that generated the request.

210

"a_key": "", # Properties of the object.

211

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

212

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

213

# unspecified, the service will attempt to choose a reasonable

214

# default. This should be in the form of the API service name,

215

# e.g. "compute.googleapis.com".

216

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

217

# specified in order for the job to have workers.

218

{ # Describes one particular pool of Cloud Dataflow workers to be

219

# instantiated by the Cloud Dataflow service in order to perform the

220

# computations required by a job. Note that a workflow job may use

221

# multiple pools, in order to match the various computational

222

# requirements of the various stages of the job.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

223

"diskSourceImage": "A String", # Fully qualified source image for disks.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

224

"ipConfiguration": "A String", # Configuration for VM IPs.

225

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

226

# are supported.

227

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

228

# service will attempt to choose a reasonable default.

229

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

230

# the service will use the network "default".

231

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

232

# will attempt to choose a reasonable default.

233

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

234

# attempt to choose a reasonable default.

235

"metadata": { # Metadata to set on the Google Compute Engine VMs.

236

"a_key": "A String",

237

},

238

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

239

# Compute Engine API.

240

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

241

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

242

# `TEARDOWN_NEVER`.

243

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

244

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

245

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

246

# down.

247

#

248

# If the workers are not torn down by the service, they will

249

# continue to run and use Google Compute Engine VM resources in the

250

# user's project until they are explicitly terminated by the user.

251

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

252

# policy except for small, manually supervised test jobs.

253

#

254

# If unknown or unspecified, the service will attempt to choose a reasonable

255

# default.

256

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

257

# service will choose a number of threads (according to the number of cores

258

# on the selected machine type for batch, or 1 by convention for streaming).

259

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

260

# the form "regions/REGION/subnetworks/SUBNETWORK".

261

"poolArgs": { # Extra arguments for this worker pool.

262

"a_key": "", # Properties of the object. Contains field @type with type URL.

263

},

264

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

265

# execute the job. If zero or unspecified, the service will

266

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

267

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

268

# using the standard Dataflow task runner. Users should ignore

269

# this field.

270

"workflowFileName": "A String", # The file to store the workflow in.

271

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

272

# will not be uploaded.

273

#

274

# The supported resource type is:

275

#

276

# Google Cloud Storage:

277

# storage.googleapis.com/{bucket}/{object}

278

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

279

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

280

# taskrunner; e.g. "root".

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

281

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

282

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

283

"vmId": "A String", # The ID string of the VM.

284

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

285

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

286

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

287

# access the Cloud Dataflow API.

288

"A String",

289

],

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

290

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

291

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

292

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

293

# "shuffle/v1beta1".

294

"workerId": "A String", # The ID of the worker running this pipeline.

295

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

296

#

297

# When workers access Google Cloud APIs, they logically do so via

298

# relative URLs. If this field is specified, it supplies the base

299

# URL to use for resolving these relative URLs. The normative

300

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

301

# Locators".

302

#

303

# If not specified, the default value is "http://www.googleapis.com/"

304

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

305

# "dataflow/v1b3/projects".

306

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

307

# storage.

308

#

309

# The supported resource type is:

310

#

311

# Google Cloud Storage:

312

#

313

# storage.googleapis.com/{bucket}/{object}

314

# bucket.storage.googleapis.com/{object}

315

},

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

316

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

317

# taskrunner; e.g. "wheel".

318

"languageHint": "A String", # The suggested backend language.

319

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

320

# console.

321

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

322

"logDir": "A String", # The directory on the VM to store logs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

323

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

324

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

325

#

326

# When workers access Google Cloud APIs, they logically do so via

327

# relative URLs. If this field is specified, it supplies the base

328

# URL to use for resolving these relative URLs. The normative

329

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

330

# Locators".

331

#

332

# If not specified, the default value is "http://www.googleapis.com/"

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

333

"harnessCommand": "A String", # The command to launch the worker harness.

334

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

335

# temporary storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

336

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

337

# The supported resource type is:

338

#

339

# Google Cloud Storage:

340

# storage.googleapis.com/{bucket}/{object}

341

# bucket.storage.googleapis.com/{object}

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

342

},

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

343

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

344

# select a default set of packages which are useful to worker

345

# harnesses written in a particular language.

346

"packages": [ # Packages to be installed on workers.

347

{ # The packages that must be installed in order for a worker to run the

348

# steps of the Cloud Dataflow job that will be assigned to its worker

349

# pool.

350

#

351

# This is the mechanism by which the Cloud Dataflow SDK causes code to

352

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

353

# might use this to install jars containing the user's code and all of the

354

# various dependencies (libraries, data files, etc.) required in order

355

# for that code to run.

356

"name": "A String", # The name of the package.

357

"location": "A String", # The resource to read the package from. The supported resource type is:

358

#

359

# Google Cloud Storage:

360

#

361

# storage.googleapis.com/{bucket}

362

# bucket.storage.googleapis.com/

363

},

364

],

365

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

366

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

367

"algorithm": "A String", # The algorithm to use for autoscaling.

368

},

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

369

"dataDisks": [ # Data disks that are used by a VM in this workflow.

370

{ # Describes the data disk used by a workflow job.

371

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

372

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

373

# attempt to choose a reasonable default.

374

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

375

# must be a disk type appropriate to the project and zone in which

376

# the workers will run. If unknown or unspecified, the service

377

# will attempt to choose a reasonable default.

378

#

379

# For example, the standard persistent disk type is a resource name

380

# typically ending in "pd-standard". If SSD persistent disks are

381

# available, the resource name typically ends with "pd-ssd". The

382

# actual valid values are defined the Google Compute Engine API,

383

# not by the Cloud Dataflow API; consult the Google Compute Engine

384

# documentation for more information about determining the set of

385

# available disk types for a particular project and zone.

386

#

387

# Google Compute Engine Disk types are local to a particular

388

# project in a particular zone, and so the resource name will

389

# typically look something like this:

390

#

391

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

392

},

393

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

394

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

395

# attempt to choose a reasonable default.

396

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

397

# harness, residing in Google Container Registry.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

398

},

399

],

400

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

401

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

402

# A description of the user pipeline and stages through which it is executed.

403

# Created by Cloud Dataflow service. Only retrieved with

404

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

405

# form. This data is provided by the Dataflow service for ease of visualizing

406

# the pipeline and interpretting Dataflow provided metrics.

407

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

408

{ # Description of the type, names/ids, and input/outputs for a transform.

409

"kind": "A String", # Type of transform.

410

"name": "A String", # User provided name for this transform instance.

411

"inputCollectionName": [ # User names for all collection inputs to this transform.

412

"A String",

413

],

414

"displayData": [ # Transform-specific display data.

415

{ # Data provided with a pipeline or transform to provide descriptive info.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

416

"key": "A String", # The key identifying the display data.

417

# This is intended to be used as a label for the display data

418

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

419

"shortStrValue": "A String", # A possible additional shorter value to display.

420

# For example a java_class_name_value of com.mypackage.MyDoFn

421

# will be stored with MyDoFn as the short_str_value and

422

# com.mypackage.MyDoFn as the java_class_name value.

423

# short_str_value can be displayed and java_class_name_value

424

# will be displayed as a tooltip.

425

"timestampValue": "A String", # Contains value if the data is of timestamp type.

426

"url": "A String", # An optional full URL.

427

"floatValue": 3.14, # Contains value if the data is of float type.

428

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

429

# language namespace (i.e. python module) which defines the display data.

430

# This allows a dax monitoring system to specially handle the data

431

# and perform custom rendering.

432

"javaClassValue": "A String", # Contains value if the data is of java class type.

433

"label": "A String", # An optional label to display in a dax UI for the element.

434

"boolValue": True or False, # Contains value if the data is of a boolean type.

435

"strValue": "A String", # Contains value if the data is of string type.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

436

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

437

"int64Value": "A String", # Contains value if the data is of int64 type.

438

},

439

],

440

"outputCollectionName": [ # User names for all collection outputs to this transform.

441

"A String",

442

],

443

"id": "A String", # SDK generated id of this transform instance.

444

},

445

],

446

"displayData": [ # Pipeline level display data.

447

{ # Data provided with a pipeline or transform to provide descriptive info.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

448

"key": "A String", # The key identifying the display data.

449

# This is intended to be used as a label for the display data

450

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

451

"shortStrValue": "A String", # A possible additional shorter value to display.

452

# For example a java_class_name_value of com.mypackage.MyDoFn

453

# will be stored with MyDoFn as the short_str_value and

454

# com.mypackage.MyDoFn as the java_class_name value.

455

# short_str_value can be displayed and java_class_name_value

456

# will be displayed as a tooltip.

457

"timestampValue": "A String", # Contains value if the data is of timestamp type.

458

"url": "A String", # An optional full URL.

459

"floatValue": 3.14, # Contains value if the data is of float type.

460

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

461

# language namespace (i.e. python module) which defines the display data.

462

# This allows a dax monitoring system to specially handle the data

463

# and perform custom rendering.

464

"javaClassValue": "A String", # Contains value if the data is of java class type.

465

"label": "A String", # An optional label to display in a dax UI for the element.

466

"boolValue": True or False, # Contains value if the data is of a boolean type.

467

"strValue": "A String", # Contains value if the data is of string type.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

468

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

469

"int64Value": "A String", # Contains value if the data is of int64 type.

470

},

471

],

472

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

473

{ # Description of the composing transforms, names/ids, and input/outputs of a

474

# stage of execution. Some composing transforms and sources may have been

475

# generated by the Dataflow service during execution planning.

476

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

477

{ # Description of an interstitial value between transforms in an execution

478

# stage.

479

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

480

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

481

# source is most closely associated.

482

"name": "A String", # Dataflow service generated name for this source.

483

},

484

],

485

"kind": "A String", # Type of tranform this stage is executing.

486

"name": "A String", # Dataflow service generated name for this stage.

487

"outputSource": [ # Output sources for this stage.

488

{ # Description of an input or output of an execution stage.

489

"userName": "A String", # Human-readable name for this source; may be user or system generated.

490

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

491

# source is most closely associated.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

492

"name": "A String", # Dataflow service generated name for this source.

493

"sizeBytes": "A String", # Size of the source, if measurable.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

494

},

495

],

496

"inputSource": [ # Input sources for this stage.

497

{ # Description of an input or output of an execution stage.

498

"userName": "A String", # Human-readable name for this source; may be user or system generated.

499

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

500

# source is most closely associated.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

501

"name": "A String", # Dataflow service generated name for this source.

502

"sizeBytes": "A String", # Size of the source, if measurable.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

503

},

504

],

505

"componentTransform": [ # Transforms that comprise this execution stage.

506

{ # Description of a transform executed as part of an execution stage.

507

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

508

"originalTransform": "A String", # User name for the original user transform with which this transform is

509

# most closely associated.

510

"name": "A String", # Dataflow service generated name for this source.

511

},

512

],

513

"id": "A String", # Dataflow service generated id for this stage.

514

},

515

],

516

},

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

517

"steps": [ # The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

518

{ # Defines a particular step within a Cloud Dataflow job.

519

#

520

# A job consists of multiple steps, each of which performs some

521

# specific operation as part of the overall job. Data is typically

522

# passed from one step to another as part of the job.

523

#

524

# Here's an example of a sequence of steps which together implement a

525

# Map-Reduce job:

526

#

527

# * Read a collection of data from some source, parsing the

528

# collection's elements.

529

#

530

# * Validate the elements.

531

#

532

# * Apply a user-defined function to map each element to some value

533

# and extract an element-specific key value.

534

#

535

# * Group elements with the same key into a single element with

536

# that key, transforming a multiply-keyed collection into a

537

# uniquely-keyed collection.

538

#

539

# * Write the elements out to some data sink.

540

#

541

# Note that the Cloud Dataflow service may be used to run many different

542

# types of jobs, not just Map-Reduce.

543

"kind": "A String", # The kind of step in the Cloud Dataflow job.

544

"properties": { # Named properties associated with the step. Each kind of

545

# predefined step has its own required set of properties.

546

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

547

"a_key": "", # Properties of the object.

548

},

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

549

"name": "A String", # The name that identifies the step. This must be unique for each

550

# step with respect to all other steps in the Cloud Dataflow job.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

551

},

552

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

553

"currentState": "A String", # The current state of the job.

554

#

555

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

556

# specified.

557

#

558

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

559

# terminal state. After a job has reached a terminal state, no

560

# further state updates may be made.

561

#

562

# This field may be mutated by the Cloud Dataflow service;

563

# callers cannot mutate it.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

564

"tempFiles": [ # A set of files the system should be aware of that are used

565

# for temporary storage. These temporary files will be

566

# removed on job completion.

567

# No duplicates are allowed.

568

# No file patterns are supported.

569

#

570

# The supported files are:

571

#

572

# Google Cloud Storage:

573

#

574

# storage.googleapis.com/{bucket}/{object}

575

# bucket.storage.googleapis.com/{object}

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

576

"A String",

577

],

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

578

"type": "A String", # The type of Cloud Dataflow job.

579

"id": "A String", # The unique ID of this job.

580

#

581

# This field is set by the Cloud Dataflow service when the Job is

582

# created, and is immutable for the life of the job.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

583

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

584

# of the job it replaced.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

585

#

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

586

# When sending a `CreateJobRequest`, you can update a job by specifying it

587

# here. The job named here is stopped, and its intermediate state is

588

# transferred to this job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

589

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

590

# isn't contained in the submitted job.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

591

"stages": { # A mapping from each stage to the information about that stage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

592

"a_key": { # Contains information about how a particular

593

# google.dataflow.v1beta3.Step will be executed.

594

"stepName": [ # The steps associated with the execution stage.

595

# Note that stages may have several steps, and that a given step

596

# might be run by more than one stage.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

"A String",

],

},

},

},

}</pre>

</div>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

605

606

<code class="details" id="get">get(projectId, gcsPath=None, x__xgafv=None, view=None)</code>

607

<pre>Get the template associated with a template.

608

609

Args:

610

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

611

gcsPath: string, Required. A Cloud Storage path to the template from which to

612

create the job.

613

Must be a valid Cloud Storage URL, beginning with `gs://`.

614

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

view: string, The view to retrieve. Defaults to METADATA_ONLY.

619

620

Returns:

621

An object of the form:

622

623

{ # The response to a GetTemplate request.

624

"status": { # The `Status` type defines a logical error model that is suitable for different # The status of the get template request. Any problems with the

625

# request will be indicated in the error_details.

626

# programming environments, including REST APIs and RPC APIs. It is used by

627

# [gRPC](https://github.com/grpc). The error model is designed to be:

628

#

629

# - Simple to use and understand for most users

630

# - Flexible enough to meet unexpected needs

#

# # Overview

#

# The `Status` message contains three pieces of data: error code, error message,

635

# and error details. The error code should be an enum value of

636

# google.rpc.Code, but it may accept additional error codes if needed. The

637

# error message should be a developer-facing English message that helps

638

# developers *understand* and *resolve* the error. If a localized user-facing

639

# error message is needed, put the localized message in the error details or

640

# localize it in the client. The optional error details may contain arbitrary

641

# information about the error. There is a predefined set of error detail types

642

# in the package `google.rpc` which can be used for common error conditions.

#

# # Language mapping

#

# The `Status` message is the logical representation of the error model, but it

647

# is not necessarily the actual wire format. When the `Status` message is

648

# exposed in different client libraries and different wire protocols, it can be

649

# mapped differently. For example, it will likely be mapped to some exceptions

650

# in Java, but more likely mapped to some error codes in C.

#

# # Other uses

#

# The error model and the `Status` message can be used in a variety of

655

# environments, either with or without APIs, to provide a

656

# consistent developer experience across different environments.

657

#

658

# Example uses of this error model include:

659

#

660

# - Partial errors. If a service needs to return partial errors to the client,

661

# it may embed the `Status` in the normal response to indicate the partial

662

# errors.

663

#

664

# - Workflow errors. A typical workflow has multiple steps. Each step may

665

# have a `Status` message for error reporting purpose.

666

#

667

# - Batch operations. If a client uses batch request and batch response, the

668

# `Status` message should be used directly inside batch response, one for

669

# each error sub-response.

670

#

671

# - Asynchronous operations. If an API call embeds asynchronous operation

672

# results in its response, the status of those operations should be

673

# represented directly using the `Status` message.

674

#

675

# - Logging. If some API errors are stored in logs, the message `Status` could

676

# be used directly after any stripping needed for security/privacy reasons.

677

"message": "A String", # A developer-facing error message, which should be in English. Any

678

# user-facing error message should be localized and sent in the

679

# google.rpc.Status.details field, or localized by the client.

680

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

681

"details": [ # A list of messages that carry the error details. There will be a

682

# common set of message types for APIs to use.

683

{

684

"a_key": "", # Properties of the object. Contains field @type with type URL.

},

],

},

"metadata": { # Metadata describing a template. # The template metadata describing the template name, available

689

# parameters, etc.

690

"bypassTempDirValidation": True or False, # If true, will bypass the validation that the temp directory is

691

# writable. This should only be used with templates for pipelines

692

# that are guaranteed not to need to write to the temp directory,

693

# which is subject to change based on the optimizer.

694

"name": "A String", # Required. The name of the template.

695

"parameters": [ # The parameters for the template.

696

{ # Metadata for a specific parameter.

697

"regexes": [ # Optional. Regexes that the parameter must match.

698

"A String",

699

],

700

"helpText": "A String", # Required. The help text to display for the parameter.

701

"name": "A String", # Required. The name of the parameter.

702

"isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false.

703

"label": "A String", # Required. The label to display for the parameter.

704

},

705

],

706

"description": "A String", # Optional. A description of the template.

},

}</pre>

</div>

<code class="details" id="launch">launch(projectId, body, dryRun=None, gcsPath=None, x__xgafv=None)</code>

713

<pre>Launch a template.

714

715

Args:

716

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

717

body: object, The request body. (required)

718

The object takes the form of:

719

720

{ # Parameters to provide to the template being launched.

721

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

722

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

723

# available to your pipeline during execution, from 1 to 1000.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

724

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

725

# Must be a valid Cloud Storage URL, beginning with `gs://`.

726

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

727

"zone": "A String", # The Compute Engine [availability zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

728

# for launching worker instances to run your pipeline.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

729

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

730

# Use with caution.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

731

},

732

"parameters": { # The runtime parameters to pass to the job.

733

"a_key": "A String",

734

},

735

"jobName": "A String", # Required. The job name to use for the created job.

736

}

737

738

dryRun: boolean, Whether or not the job should actually be executed after

739

validating parameters. Defaults to false. Validation errors do

740

not cause the HTTP request to fail if true.

741

gcsPath: string, Required. A Cloud Storage path to the template from which to create

742

the job.

743

Must be valid Cloud Storage URL, beginning with 'gs://'.

744

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

751

752

{ # Response to the request to launch a template.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

753

"job": { # Defines a job to be run by the Cloud Dataflow service. # The job that was launched, if the request was not a dry run and

754

# the job was successfully launched.

755

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

756

# If this field is set, the service will ensure its uniqueness.

757

# The request to create a job will fail if the service has knowledge of a

758

# previously submitted job with the same client's ID and job name.

759

# The caller may use this field to ensure idempotence of job

760

# creation across retried attempts to create a job.

761

# By default, the field is empty and, in that case, the service ignores it.

762

"requestedState": "A String", # The job's requested state.

763

#

764

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

765

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

766

# also be used to directly set a job's requested state to

767

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

768

# job if it has not already reached a terminal state.

769

"name": "A String", # The user-specified Cloud Dataflow job name.

770

#

771

# Only one Job with a given name may exist in a project at any

772

# given time. If a caller attempts to create a Job with the same

773

# name as an already-existing Job, the attempt returns the

774

# existing Job.

775

#

776

# The name must match the regular expression

777

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

778

"currentStateTime": "A String", # The timestamp associated with the current state.

779

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

780

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

781

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

782

"labels": { # User-defined labels for this job.

783

#

784

# The labels map can contain no more than 64 entries. Entries of the labels

785

# map are UTF8 strings that comply with the following restrictions:

786

#

787

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

788

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

789

# * Both keys and values are additionally constrained to be <= 128 bytes in

790

# size.

791

"a_key": "A String",

792

},

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

793

"location": "A String", # The location that contains this job.

794

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

795

# Cloud Dataflow service.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

796

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

797

# corresponding name prefixes of the new job.

798

"a_key": "A String",

799

},

800

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

801

"version": { # A structure describing which components and their versions of the service

802

# are required in order to run the job.

803

"a_key": "", # Properties of the object.

804

},

805

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

806

# storage. The system will append the suffix "/temp-{JOBNAME} to

807

# this resource prefix, where {JOBNAME} is the value of the

808

# job_name field. The resulting bucket and object prefix is used

809

# as the prefix of the resources used to store temporary data

810

# needed during the job execution. NOTE: This will override the

811

# value in taskrunner_settings.

812

# The supported resource type is:

813

#

814

# Google Cloud Storage:

815

#

816

# storage.googleapis.com/{bucket}/{object}

817

# bucket.storage.googleapis.com/{object}

818

"internalExperiments": { # Experimental settings.

819

"a_key": "", # Properties of the object. Contains field @type with type URL.

820

},

821

"dataset": "A String", # The dataset for the current project where various workflow

822

# related tables are stored.

823

#

824

# The supported resource type is:

825

#

826

# Google BigQuery:

827

# bigquery.googleapis.com/{dataset}

828

"experiments": [ # The list of experiments to enable.

829

"A String",

830

],

831

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

832

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

833

# options are passed through the service and are used to recreate the

834

# SDK pipeline options on the worker in a language agnostic and platform

835

# independent way.

836

"a_key": "", # Properties of the object.

837

},

838

"userAgent": { # A description of the process that generated the request.

839

"a_key": "", # Properties of the object.

840

},

841

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

842

# unspecified, the service will attempt to choose a reasonable

843

# default. This should be in the form of the API service name,

844

# e.g. "compute.googleapis.com".

845

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

846

# specified in order for the job to have workers.

847

{ # Describes one particular pool of Cloud Dataflow workers to be

848

# instantiated by the Cloud Dataflow service in order to perform the

849

# computations required by a job. Note that a workflow job may use

850

# multiple pools, in order to match the various computational

851

# requirements of the various stages of the job.

852

"diskSourceImage": "A String", # Fully qualified source image for disks.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

853

"ipConfiguration": "A String", # Configuration for VM IPs.

854

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

855

# are supported.

856

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

857

# service will attempt to choose a reasonable default.

858

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

859

# the service will use the network "default".

860

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

861

# will attempt to choose a reasonable default.

862

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

863

# attempt to choose a reasonable default.

864

"metadata": { # Metadata to set on the Google Compute Engine VMs.

865

"a_key": "A String",

866

},

867

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

868

# Compute Engine API.

869

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

870

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

871

# `TEARDOWN_NEVER`.

872

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

873

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

874

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

875

# down.

876

#

877

# If the workers are not torn down by the service, they will

878

# continue to run and use Google Compute Engine VM resources in the

879

# user's project until they are explicitly terminated by the user.

880

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

881

# policy except for small, manually supervised test jobs.

882

#

883

# If unknown or unspecified, the service will attempt to choose a reasonable

884

# default.

885

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

886

# service will choose a number of threads (according to the number of cores

887

# on the selected machine type for batch, or 1 by convention for streaming).

888

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

889

# the form "regions/REGION/subnetworks/SUBNETWORK".

890

"poolArgs": { # Extra arguments for this worker pool.

891

"a_key": "", # Properties of the object. Contains field @type with type URL.

892

},

893

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

894

# execute the job. If zero or unspecified, the service will

895

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

896

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

897

# using the standard Dataflow task runner. Users should ignore

898

# this field.

899

"workflowFileName": "A String", # The file to store the workflow in.

900

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

901

# will not be uploaded.

902

#

903

# The supported resource type is:

904

#

905

# Google Cloud Storage:

906

# storage.googleapis.com/{bucket}/{object}

907

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

908

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

909

# taskrunner; e.g. "root".

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

910

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

911

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

912

"vmId": "A String", # The ID string of the VM.

913

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

914

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

915

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

916

# access the Cloud Dataflow API.

917

"A String",

918

],

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

919

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

920

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

921

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

922

# "shuffle/v1beta1".

923

"workerId": "A String", # The ID of the worker running this pipeline.

924

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

925

#

926

# When workers access Google Cloud APIs, they logically do so via

927

# relative URLs. If this field is specified, it supplies the base

928

# URL to use for resolving these relative URLs. The normative

929

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

930

# Locators".

931

#

932

# If not specified, the default value is "http://www.googleapis.com/"

933

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

934

# "dataflow/v1b3/projects".

935

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

936

# storage.

937

#

938

# The supported resource type is:

939

#

940

# Google Cloud Storage:

941

#

942

# storage.googleapis.com/{bucket}/{object}

943

# bucket.storage.googleapis.com/{object}

944

},

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

945

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

946

# taskrunner; e.g. "wheel".

947

"languageHint": "A String", # The suggested backend language.

948

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

949

# console.

950

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

951

"logDir": "A String", # The directory on the VM to store logs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

952

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

953

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

954

#

955

# When workers access Google Cloud APIs, they logically do so via

956

# relative URLs. If this field is specified, it supplies the base

957

# URL to use for resolving these relative URLs. The normative

958

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

959

# Locators".

960

#

961

# If not specified, the default value is "http://www.googleapis.com/"

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

962

"harnessCommand": "A String", # The command to launch the worker harness.

963

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

964

# temporary storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

965

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

966

# The supported resource type is:

967

#

968

# Google Cloud Storage:

969

# storage.googleapis.com/{bucket}/{object}

970

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

971

},

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

972

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

973

# select a default set of packages which are useful to worker

974

# harnesses written in a particular language.

975

"packages": [ # Packages to be installed on workers.

976

{ # The packages that must be installed in order for a worker to run the

977

# steps of the Cloud Dataflow job that will be assigned to its worker

978

# pool.

979

#

980

# This is the mechanism by which the Cloud Dataflow SDK causes code to

981

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

982

# might use this to install jars containing the user's code and all of the

983

# various dependencies (libraries, data files, etc.) required in order

984

# for that code to run.

985

"name": "A String", # The name of the package.

986

"location": "A String", # The resource to read the package from. The supported resource type is:

987

#

988

# Google Cloud Storage:

989

#

990

# storage.googleapis.com/{bucket}

991

# bucket.storage.googleapis.com/

992

},

993

],

994

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

995

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

996

"algorithm": "A String", # The algorithm to use for autoscaling.

997

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

998

"dataDisks": [ # Data disks that are used by a VM in this workflow.

999

{ # Describes the data disk used by a workflow job.

1000

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1001

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1002

# attempt to choose a reasonable default.

1003

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1004

# must be a disk type appropriate to the project and zone in which

1005

# the workers will run. If unknown or unspecified, the service

1006

# will attempt to choose a reasonable default.

1007

#

1008

# For example, the standard persistent disk type is a resource name

1009

# typically ending in "pd-standard". If SSD persistent disks are

1010

# available, the resource name typically ends with "pd-ssd". The

1011

# actual valid values are defined the Google Compute Engine API,

1012

# not by the Cloud Dataflow API; consult the Google Compute Engine

1013

# documentation for more information about determining the set of

1014

# available disk types for a particular project and zone.

1015

#

1016

# Google Compute Engine Disk types are local to a particular

1017

# project in a particular zone, and so the resource name will

1018

# typically look something like this:

1019

#

1020

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1021

},

1022

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1023

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1024

# attempt to choose a reasonable default.

1025

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1026

# harness, residing in Google Container Registry.

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1031

# A description of the user pipeline and stages through which it is executed.

1032

# Created by Cloud Dataflow service. Only retrieved with

1033

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1034

# form. This data is provided by the Dataflow service for ease of visualizing

1035

# the pipeline and interpretting Dataflow provided metrics.

1036

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1037

{ # Description of the type, names/ids, and input/outputs for a transform.

1038

"kind": "A String", # Type of transform.

1039

"name": "A String", # User provided name for this transform instance.

1040

"inputCollectionName": [ # User names for all collection inputs to this transform.

1041

"A String",

1042

],

1043

"displayData": [ # Transform-specific display data.

1044

{ # Data provided with a pipeline or transform to provide descriptive info.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1045

"key": "A String", # The key identifying the display data.

1046

# This is intended to be used as a label for the display data

1047

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1048

"shortStrValue": "A String", # A possible additional shorter value to display.

1049

# For example a java_class_name_value of com.mypackage.MyDoFn

1050

# will be stored with MyDoFn as the short_str_value and

1051

# com.mypackage.MyDoFn as the java_class_name value.

1052

# short_str_value can be displayed and java_class_name_value

1053

# will be displayed as a tooltip.

1054

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1055

"url": "A String", # An optional full URL.

1056

"floatValue": 3.14, # Contains value if the data is of float type.

1057

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1058

# language namespace (i.e. python module) which defines the display data.

1059

# This allows a dax monitoring system to specially handle the data

1060

# and perform custom rendering.

1061

"javaClassValue": "A String", # Contains value if the data is of java class type.

1062

"label": "A String", # An optional label to display in a dax UI for the element.

1063

"boolValue": True or False, # Contains value if the data is of a boolean type.

1064

"strValue": "A String", # Contains value if the data is of string type.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1065

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1066

"int64Value": "A String", # Contains value if the data is of int64 type.

1067

},

1068

],

1069

"outputCollectionName": [ # User names for all collection outputs to this transform.

1070

"A String",

1071

],

1072

"id": "A String", # SDK generated id of this transform instance.

1073

},

1074

],

1075

"displayData": [ # Pipeline level display data.

1076

{ # Data provided with a pipeline or transform to provide descriptive info.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1077

"key": "A String", # The key identifying the display data.

1078

# This is intended to be used as a label for the display data

1079

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1080

"shortStrValue": "A String", # A possible additional shorter value to display.

1081

# For example a java_class_name_value of com.mypackage.MyDoFn

1082

# will be stored with MyDoFn as the short_str_value and

1083

# com.mypackage.MyDoFn as the java_class_name value.

1084

# short_str_value can be displayed and java_class_name_value

1085

# will be displayed as a tooltip.

1086

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1087

"url": "A String", # An optional full URL.

1088

"floatValue": 3.14, # Contains value if the data is of float type.

1089

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1090

# language namespace (i.e. python module) which defines the display data.

1091

# This allows a dax monitoring system to specially handle the data

1092

# and perform custom rendering.

1093

"javaClassValue": "A String", # Contains value if the data is of java class type.

1094

"label": "A String", # An optional label to display in a dax UI for the element.

1095

"boolValue": True or False, # Contains value if the data is of a boolean type.

1096

"strValue": "A String", # Contains value if the data is of string type.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1097

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1098

"int64Value": "A String", # Contains value if the data is of int64 type.

1099

},

1100

],

1101

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1102

{ # Description of the composing transforms, names/ids, and input/outputs of a

1103

# stage of execution. Some composing transforms and sources may have been

1104

# generated by the Dataflow service during execution planning.

1105

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1106

{ # Description of an interstitial value between transforms in an execution

1107

# stage.

1108

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1109

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1110

# source is most closely associated.

1111

"name": "A String", # Dataflow service generated name for this source.

1112

},

1113

],

1114

"kind": "A String", # Type of tranform this stage is executing.

1115

"name": "A String", # Dataflow service generated name for this stage.

1116

"outputSource": [ # Output sources for this stage.

1117

{ # Description of an input or output of an execution stage.

1118

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1119

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1120

# source is most closely associated.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1121

"name": "A String", # Dataflow service generated name for this source.

1122

"sizeBytes": "A String", # Size of the source, if measurable.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1123

},

1124

],

1125

"inputSource": [ # Input sources for this stage.

1126

{ # Description of an input or output of an execution stage.

1127

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1128

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1129

# source is most closely associated.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1130

"name": "A String", # Dataflow service generated name for this source.

1131

"sizeBytes": "A String", # Size of the source, if measurable.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1132

},

1133

],

1134

"componentTransform": [ # Transforms that comprise this execution stage.

1135

{ # Description of a transform executed as part of an execution stage.

1136

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1137

"originalTransform": "A String", # User name for the original user transform with which this transform is

1138

# most closely associated.

1139

"name": "A String", # Dataflow service generated name for this source.

1140

},

1141

],

1142

"id": "A String", # Dataflow service generated id for this stage.

},

],

},

"steps": [ # The top-level steps that constitute the entire job.

1147

{ # Defines a particular step within a Cloud Dataflow job.

1148

#

1149

# A job consists of multiple steps, each of which performs some

1150

# specific operation as part of the overall job. Data is typically

1151

# passed from one step to another as part of the job.

1152

#

1153

# Here's an example of a sequence of steps which together implement a

1154

# Map-Reduce job:

1155

#

1156

# * Read a collection of data from some source, parsing the

1157

# collection's elements.

1158

#

1159

# * Validate the elements.

1160

#

1161

# * Apply a user-defined function to map each element to some value

1162

# and extract an element-specific key value.

1163

#

1164

# * Group elements with the same key into a single element with

1165

# that key, transforming a multiply-keyed collection into a

1166

# uniquely-keyed collection.

1167

#

1168

# * Write the elements out to some data sink.

1169

#

1170

# Note that the Cloud Dataflow service may be used to run many different

1171

# types of jobs, not just Map-Reduce.

1172

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1173

"properties": { # Named properties associated with the step. Each kind of

1174

# predefined step has its own required set of properties.

1175

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1176

"a_key": "", # Properties of the object.

1177

},

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1178

"name": "A String", # The name that identifies the step. This must be unique for each

1179

# step with respect to all other steps in the Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1180

},

1181

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1182

"currentState": "A String", # The current state of the job.

1183

#

1184

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1185

# specified.

1186

#

1187

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1188

# terminal state. After a job has reached a terminal state, no

1189

# further state updates may be made.

1190

#

1191

# This field may be mutated by the Cloud Dataflow service;

1192

# callers cannot mutate it.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1193

"tempFiles": [ # A set of files the system should be aware of that are used

1194

# for temporary storage. These temporary files will be

1195

# removed on job completion.

1196

# No duplicates are allowed.

1197

# No file patterns are supported.

1198

#

1199

# The supported files are:

1200

#

1201

# Google Cloud Storage:

1202

#

1203

# storage.googleapis.com/{bucket}/{object}

1204

# bucket.storage.googleapis.com/{object}

1205

"A String",

1206

],

1207

"type": "A String", # The type of Cloud Dataflow job.

1208

"id": "A String", # The unique ID of this job.

1209

#

1210

# This field is set by the Cloud Dataflow service when the Job is

1211

# created, and is immutable for the life of the job.

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1212

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1213

# of the job it replaced.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1214

#

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame^]

1215

# When sending a `CreateJobRequest`, you can update a job by specifying it

1216

# here. The job named here is stopped, and its intermediate state is

1217

# transferred to this job.

Sai Cheemalapati