Blame - docs/dyn/dataflow_v1b3.projects.jobs.html - platform/external/python/google-api-python-client

2015-06-15 16:44:50 +0000

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

88

<code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>

89

</p>

90

<p class="firstline">Returns the workItems Resource.</p>

91

92

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

93

<code><a href="#aggregated">aggregated(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</a></code></p>

94

<p class="firstline">List the jobs of a project across all regions.</p>

95

96

<code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>

97

<p class="firstline">Retrieves the next page of results.</p>

98

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

99

<code><a href="#create">create(projectId, body=None, location=None, x__xgafv=None, replaceJobId=None, view=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

100

<p class="firstline">Creates a Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

101

Jon Wayne Parrott

692617a

2017-01-06 09:58:29 -0800

[diff] [blame]

102

<code><a href="#get">get(projectId, jobId, location=None, x__xgafv=None, view=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

103

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

104

Jon Wayne Parrott

692617a

2017-01-06 09:58:29 -0800

[diff] [blame]

105

<code><a href="#getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</a></code></p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

106

<p class="firstline">Request the job status.</p>

107

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

108

<code><a href="#list">list(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

109

<p class="firstline">List the jobs of a project.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

110

111

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

112

<p class="firstline">Retrieves the next page of results.</p>

113

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

114

<code><a href="#snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

115

<p class="firstline">Snapshot the state of a streaming job.</p>

116

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

117

<code><a href="#update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

118

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

119

<h3>Method Details</h3>

120

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

121

<code class="details" id="aggregated">aggregated(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</code>

122

<pre>List the jobs of a project across all regions.

123

124

Args:

125

projectId: string, The project which owns the jobs. (required)

126

pageSize: integer, If there are many jobs, limit response to at most this many.

127

The actual number of jobs returned will be the lesser of max_responses

128

and an unspecified server-defined limit.

129

pageToken: string, Set this to the 'next_page_token' field of a previous response

130

to request additional results in a long list.

131

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

location: string, The [regional endpoint]

136

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

137

contains this job.

138

filter: string, The kind of filter to use.

139

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

140

141

Returns:

142

An object of the form:

143

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

144

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

145

# be a partial response, depending on the page size in the ListJobsRequest.

146

# However, if the project does not have any jobs, an instance of

147

# ListJobsResponse is not returned and the requests's response

148

# body is empty {}.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

149

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

150

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

151

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

152

# failed to respond.

153

{ # Indicates which [regional endpoint]

154

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

155

# to respond to a request for data.

156

"name": "A String", # The name of the [regional endpoint]

157

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

# failed to respond.

},

],

"jobs": [ # A subset of the requested job information.

162

{ # Defines a job to be run by the Cloud Dataflow service.

163

"labels": { # User-defined labels for this job.

164

#

165

# The labels map can contain no more than 64 entries. Entries of the labels

166

# map are UTF8 strings that comply with the following restrictions:

167

#

168

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

169

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

170

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

175

# by the metadata values provided here. Populated for ListJobs and all GetJob

176

# views SUMMARY and higher.

177

# ListJob response and Job SUMMARY view.

178

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

179

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

180

"version": "A String", # The version of the SDK used to run the job.

181

"sdkSupportStatus": "A String", # The support status for this SDK version.

182

},

183

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

184

{ # Metadata for a PubSub connector used by the job.

185

"topic": "A String", # Topic accessed in the connection.

186

"subscription": "A String", # Subscription used in the connection.

187

},

188

],

189

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

190

{ # Metadata for a Datastore connector used by the job.

191

"projectId": "A String", # ProjectId accessed in the connection.

192

"namespace": "A String", # Namespace used in the connection.

193

},

194

],

195

"fileDetails": [ # Identification of a File source used in the Dataflow job.

196

{ # Metadata for a File connector used by the job.

197

"filePattern": "A String", # File Pattern used to access files by the connector.

198

},

199

],

200

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

201

{ # Metadata for a Spanner connector used by the job.

202

"instanceId": "A String", # InstanceId accessed in the connection.

203

"projectId": "A String", # ProjectId accessed in the connection.

204

"databaseId": "A String", # DatabaseId accessed in the connection.

205

},

206

],

207

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

208

{ # Metadata for a BigTable connector used by the job.

209

"instanceId": "A String", # InstanceId accessed in the connection.

210

"projectId": "A String", # ProjectId accessed in the connection.

211

"tableId": "A String", # TableId accessed in the connection.

212

},

213

],

214

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

215

{ # Metadata for a BigQuery connector used by the job.

216

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

217

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

218

"table": "A String", # Table accessed in the connection.

219

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

224

# A description of the user pipeline and stages through which it is executed.

225

# Created by Cloud Dataflow service. Only retrieved with

226

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

227

# form. This data is provided by the Dataflow service for ease of visualizing

228

# the pipeline and interpreting Dataflow provided metrics.

229

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

230

{ # Description of the type, names/ids, and input/outputs for a transform.

231

"kind": "A String", # Type of transform.

232

"name": "A String", # User provided name for this transform instance.

233

"inputCollectionName": [ # User names for all collection inputs to this transform.

234

"A String",

235

],

236

"displayData": [ # Transform-specific display data.

237

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

238

"key": "A String", # The key identifying the display data.

239

# This is intended to be used as a label for the display data

240

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

241

"shortStrValue": "A String", # A possible additional shorter value to display.

242

# For example a java_class_name_value of com.mypackage.MyDoFn

243

# will be stored with MyDoFn as the short_str_value and

244

# com.mypackage.MyDoFn as the java_class_name value.

245

# short_str_value can be displayed and java_class_name_value

246

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

247

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

248

"url": "A String", # An optional full URL.

249

"floatValue": 3.14, # Contains value if the data is of float type.

250

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

251

# language namespace (i.e. python module) which defines the display data.

252

# This allows a dax monitoring system to specially handle the data

253

# and perform custom rendering.

254

"javaClassValue": "A String", # Contains value if the data is of java class type.

255

"label": "A String", # An optional label to display in a dax UI for the element.

256

"boolValue": True or False, # Contains value if the data is of a boolean type.

257

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

258

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

259

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

260

},

261

],

262

"outputCollectionName": [ # User names for all collection outputs to this transform.

263

"A String",

264

],

265

"id": "A String", # SDK generated id of this transform instance.

266

},

267

],

268

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

269

{ # Description of the composing transforms, names/ids, and input/outputs of a

270

# stage of execution. Some composing transforms and sources may have been

271

# generated by the Dataflow service during execution planning.

272

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

273

{ # Description of an interstitial value between transforms in an execution

274

# stage.

275

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

276

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

277

# source is most closely associated.

278

"name": "A String", # Dataflow service generated name for this source.

279

},

280

],

281

"kind": "A String", # Type of tranform this stage is executing.

282

"name": "A String", # Dataflow service generated name for this stage.

283

"outputSource": [ # Output sources for this stage.

284

{ # Description of an input or output of an execution stage.

285

"userName": "A String", # Human-readable name for this source; may be user or system generated.

286

"sizeBytes": "A String", # Size of the source, if measurable.

287

"name": "A String", # Dataflow service generated name for this source.

288

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

289

# source is most closely associated.

290

},

291

],

292

"inputSource": [ # Input sources for this stage.

293

{ # Description of an input or output of an execution stage.

294

"userName": "A String", # Human-readable name for this source; may be user or system generated.

295

"sizeBytes": "A String", # Size of the source, if measurable.

296

"name": "A String", # Dataflow service generated name for this source.

297

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

298

# source is most closely associated.

299

},

300

],

301

"componentTransform": [ # Transforms that comprise this execution stage.

302

{ # Description of a transform executed as part of an execution stage.

303

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

304

"originalTransform": "A String", # User name for the original user transform with which this transform is

305

# most closely associated.

306

"name": "A String", # Dataflow service generated name for this source.

307

},

308

],

309

"id": "A String", # Dataflow service generated id for this stage.

310

},

311

],

312

"displayData": [ # Pipeline level display data.

313

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

314

"key": "A String", # The key identifying the display data.

315

# This is intended to be used as a label for the display data

316

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

317

"shortStrValue": "A String", # A possible additional shorter value to display.

318

# For example a java_class_name_value of com.mypackage.MyDoFn

319

# will be stored with MyDoFn as the short_str_value and

320

# com.mypackage.MyDoFn as the java_class_name value.

321

# short_str_value can be displayed and java_class_name_value

322

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

323

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

324

"url": "A String", # An optional full URL.

325

"floatValue": 3.14, # Contains value if the data is of float type.

326

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

327

# language namespace (i.e. python module) which defines the display data.

328

# This allows a dax monitoring system to specially handle the data

329

# and perform custom rendering.

330

"javaClassValue": "A String", # Contains value if the data is of java class type.

331

"label": "A String", # An optional label to display in a dax UI for the element.

332

"boolValue": True or False, # Contains value if the data is of a boolean type.

333

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

334

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

335

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

340

# callers cannot mutate it.

341

{ # A message describing the state of a particular execution stage.

342

"executionStageName": "A String", # The name of the execution stage.

343

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

344

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

345

},

346

],

347

"id": "A String", # The unique ID of this job.

348

#

349

# This field is set by the Cloud Dataflow service when the Job is

350

# created, and is immutable for the life of the job.

351

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

352

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

353

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

354

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

355

# corresponding name prefixes of the new job.

356

"a_key": "A String",

357

},

358

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

359

"workerRegion": "A String", # The Compute Engine region

360

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

361

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

362

# with worker_zone. If neither worker_region nor worker_zone is specified,

363

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

364

"version": { # A structure describing which components and their versions of the service

365

# are required in order to run the job.

366

"a_key": "", # Properties of the object.

367

},

368

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

369

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

370

# at rest, AKA a Customer Managed Encryption Key (CMEK).

371

#

372

# Format:

373

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

374

"internalExperiments": { # Experimental settings.

375

"a_key": "", # Properties of the object. Contains field @type with type URL.

376

},

377

"dataset": "A String", # The dataset for the current project where various workflow

378

# related tables are stored.

379

#

380

# The supported resource type is:

381

#

382

# Google BigQuery:

383

# bigquery.googleapis.com/{dataset}

384

"experiments": [ # The list of experiments to enable.

385

"A String",

386

],

387

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

388

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

389

# options are passed through the service and are used to recreate the

390

# SDK pipeline options on the worker in a language agnostic and platform

391

# independent way.

392

"a_key": "", # Properties of the object.

393

},

394

"userAgent": { # A description of the process that generated the request.

395

"a_key": "", # Properties of the object.

396

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

397

"workerZone": "A String", # The Compute Engine zone

398

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

399

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

400

# with worker_region. If neither worker_region nor worker_zone is specified,

401

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

402

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

403

# specified in order for the job to have workers.

404

{ # Describes one particular pool of Cloud Dataflow workers to be

405

# instantiated by the Cloud Dataflow service in order to perform the

406

# computations required by a job. Note that a workflow job may use

407

# multiple pools, in order to match the various computational

408

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

409

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

410

# harness, residing in Google Container Registry.

411

#

412

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

413

"ipConfiguration": "A String", # Configuration for VM IPs.

414

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

415

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

416

"algorithm": "A String", # The algorithm to use for autoscaling.

417

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

418

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

419

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

420

# the service will use the network "default".

421

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

422

# will attempt to choose a reasonable default.

423

"metadata": { # Metadata to set on the Google Compute Engine VMs.

424

"a_key": "A String",

425

},

426

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

427

# service will attempt to choose a reasonable default.

428

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

429

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

430

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

431

# using the standard Dataflow task runner. Users should ignore

432

# this field.

433

"workflowFileName": "A String", # The file to store the workflow in.

434

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

435

# will not be uploaded.

436

#

437

# The supported resource type is:

438

#

439

# Google Cloud Storage:

440

# storage.googleapis.com/{bucket}/{object}

441

# bucket.storage.googleapis.com/{object}

442

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

443

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

444

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

445

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

446

"vmId": "A String", # The ID string of the VM.

447

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

448

# taskrunner; e.g. "wheel".

449

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

450

# taskrunner; e.g. "root".

451

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

452

# access the Cloud Dataflow API.

453

"A String",

454

],

455

"languageHint": "A String", # The suggested backend language.

456

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

457

# console.

458

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

459

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

460

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

461

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

462

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

463

# "shuffle/v1beta1".

464

"workerId": "A String", # The ID of the worker running this pipeline.

465

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

466

#

467

# When workers access Google Cloud APIs, they logically do so via

468

# relative URLs. If this field is specified, it supplies the base

469

# URL to use for resolving these relative URLs. The normative

470

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

471

# Locators".

472

#

473

# If not specified, the default value is "http://www.googleapis.com/"

474

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

475

# "dataflow/v1b3/projects".

476

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

477

# storage.

478

#

479

# The supported resource type is:

480

#

481

# Google Cloud Storage:

482

#

483

# storage.googleapis.com/{bucket}/{object}

484

# bucket.storage.googleapis.com/{object}

485

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

486

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

487

"harnessCommand": "A String", # The command to launch the worker harness.

488

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

489

# temporary storage.

490

#

491

# The supported resource type is:

492

#

493

# Google Cloud Storage:

494

# storage.googleapis.com/{bucket}/{object}

495

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

496

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

497

#

498

# When workers access Google Cloud APIs, they logically do so via

499

# relative URLs. If this field is specified, it supplies the base

500

# URL to use for resolving these relative URLs. The normative

501

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

502

# Locators".

503

#

504

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

505

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

506

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

507

# service will choose a number of threads (according to the number of cores

508

# on the selected machine type for batch, or 1 by convention for streaming).

509

"poolArgs": { # Extra arguments for this worker pool.

510

"a_key": "", # Properties of the object. Contains field @type with type URL.

511

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

512

"packages": [ # Packages to be installed on workers.

513

{ # The packages that must be installed in order for a worker to run the

514

# steps of the Cloud Dataflow job that will be assigned to its worker

515

# pool.

516

#

517

# This is the mechanism by which the Cloud Dataflow SDK causes code to

518

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

519

# might use this to install jars containing the user's code and all of the

520

# various dependencies (libraries, data files, etc.) required in order

521

# for that code to run.

522

"location": "A String", # The resource to read the package from. The supported resource type is:

523

#

524

# Google Cloud Storage:

525

#

526

# storage.googleapis.com/{bucket}

527

# bucket.storage.googleapis.com/

528

"name": "A String", # The name of the package.

529

},

530

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

531

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

532

# select a default set of packages which are useful to worker

533

# harnesses written in a particular language.

534

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

535

# are supported.

536

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

537

# attempt to choose a reasonable default.

538

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

539

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

540

# `TEARDOWN_NEVER`.

541

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

542

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

543

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

544

# down.

545

#

546

# If the workers are not torn down by the service, they will

547

# continue to run and use Google Compute Engine VM resources in the

548

# user's project until they are explicitly terminated by the user.

549

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

550

# policy except for small, manually supervised test jobs.

551

#

552

# If unknown or unspecified, the service will attempt to choose a reasonable

553

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

554

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

555

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

556

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

557

# execute the job. If zero or unspecified, the service will

558

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

559

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

560

# the form "regions/REGION/subnetworks/SUBNETWORK".

561

"dataDisks": [ # Data disks that are used by a VM in this workflow.

562

{ # Describes the data disk used by a workflow job.

563

"mountPoint": "A String", # Directory in a VM where disk is mounted.

564

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

565

# attempt to choose a reasonable default.

566

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

567

# must be a disk type appropriate to the project and zone in which

568

# the workers will run. If unknown or unspecified, the service

569

# will attempt to choose a reasonable default.

570

#

571

# For example, the standard persistent disk type is a resource name

572

# typically ending in "pd-standard". If SSD persistent disks are

573

# available, the resource name typically ends with "pd-ssd". The

574

# actual valid values are defined the Google Compute Engine API,

575

# not by the Cloud Dataflow API; consult the Google Compute Engine

576

# documentation for more information about determining the set of

577

# available disk types for a particular project and zone.

578

#

579

# Google Compute Engine Disk types are local to a particular

580

# project in a particular zone, and so the resource name will

581

# typically look something like this:

582

#

583

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

584

},

585

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

586

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

587

# only be set in the Fn API path. For non-cross-language pipelines this

588

# should have only one entry. Cross-language pipelines will have two or more

589

# entries.

590

{ # Defines a SDK harness container for executing Dataflow pipelines.

591

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

592

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

593

# container instance with this image. If false (or unset) recommends using

594

# more than one core per SDK container instance with this image for

595

# efficiency. Note that Dataflow service may choose to override this property

596

# if needed.

597

},

598

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

599

},

600

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

601

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

602

# unspecified, the service will attempt to choose a reasonable

603

# default. This should be in the form of the API service name,

604

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

605

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

606

# storage. The system will append the suffix "/temp-{JOBNAME} to

607

# this resource prefix, where {JOBNAME} is the value of the

608

# job_name field. The resulting bucket and object prefix is used

609

# as the prefix of the resources used to store temporary data

610

# needed during the job execution. NOTE: This will override the

611

# value in taskrunner_settings.

612

# The supported resource type is:

613

#

614

# Google Cloud Storage:

615

#

616

# storage.googleapis.com/{bucket}/{object}

617

# bucket.storage.googleapis.com/{object}

618

},

619

"location": "A String", # The [regional endpoint]

620

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

621

# contains this job.

622

"tempFiles": [ # A set of files the system should be aware of that are used

623

# for temporary storage. These temporary files will be

624

# removed on job completion.

625

# No duplicates are allowed.

626

# No file patterns are supported.

627

#

628

# The supported files are:

629

#

630

# Google Cloud Storage:

631

#

632

# storage.googleapis.com/{bucket}/{object}

633

# bucket.storage.googleapis.com/{object}

634

"A String",

635

],

636

"type": "A String", # The type of Cloud Dataflow job.

637

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

638

# If this field is set, the service will ensure its uniqueness.

639

# The request to create a job will fail if the service has knowledge of a

640

# previously submitted job with the same client's ID and job name.

641

# The caller may use this field to ensure idempotence of job

642

# creation across retried attempts to create a job.

643

# By default, the field is empty and, in that case, the service ignores it.

644

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

645

# snapshot.

646

"stepsLocation": "A String", # The GCS location where the steps are stored.

647

"currentStateTime": "A String", # The timestamp associated with the current state.

648

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

649

# Flexible resource scheduling jobs are started with some delay after job

650

# creation, so start_time is unset before start and is updated when the

651

# job is started by the Cloud Dataflow service. For other jobs, start_time

652

# always equals to create_time and is immutable and set by the Cloud Dataflow

653

# service.

654

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

655

# Cloud Dataflow service.

656

"requestedState": "A String", # The job's requested state.

657

#

658

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

659

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

660

# also be used to directly set a job's requested state to

661

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

662

# job if it has not already reached a terminal state.

663

"name": "A String", # The user-specified Cloud Dataflow job name.

664

#

665

# Only one Job with a given name may exist in a project at any

666

# given time. If a caller attempts to create a Job with the same

667

# name as an already-existing Job, the attempt returns the

668

# existing Job.

669

#

670

# The name must match the regular expression

671

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

672

"steps": [ # Exactly one of step or steps_location should be specified.

673

#

674

# The top-level steps that constitute the entire job.

675

{ # Defines a particular step within a Cloud Dataflow job.

676

#

677

# A job consists of multiple steps, each of which performs some

678

# specific operation as part of the overall job. Data is typically

679

# passed from one step to another as part of the job.

680

#

681

# Here's an example of a sequence of steps which together implement a

682

# Map-Reduce job:

683

#

684

# * Read a collection of data from some source, parsing the

685

# collection's elements.

686

#

687

# * Validate the elements.

688

#

689

# * Apply a user-defined function to map each element to some value

690

# and extract an element-specific key value.

691

#

692

# * Group elements with the same key into a single element with

693

# that key, transforming a multiply-keyed collection into a

694

# uniquely-keyed collection.

695

#

696

# * Write the elements out to some data sink.

697

#

698

# Note that the Cloud Dataflow service may be used to run many different

699

# types of jobs, not just Map-Reduce.

700

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

701

"name": "A String", # The name that identifies the step. This must be unique for each

702

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

703

"properties": { # Named properties associated with the step. Each kind of

704

# predefined step has its own required set of properties.

705

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

706

"a_key": "", # Properties of the object.

707

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

708

},

709

],

710

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

711

# of the job it replaced.

712

#

713

# When sending a `CreateJobRequest`, you can update a job by specifying it

714

# here. The job named here is stopped, and its intermediate state is

715

# transferred to this job.

716

"currentState": "A String", # The current state of the job.

717

#

718

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

719

# specified.

720

#

721

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

722

# terminal state. After a job has reached a terminal state, no

723

# further state updates may be made.

724

#

725

# This field may be mutated by the Cloud Dataflow service;

726

# callers cannot mutate it.

727

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

728

# isn't contained in the submitted job.

729

"stages": { # A mapping from each stage to the information about that stage.

730

"a_key": { # Contains information about how a particular

731

# google.dataflow.v1beta3.Step will be executed.

732

"stepName": [ # The steps associated with the execution stage.

733

# Note that stages may have several steps, and that a given step

734

# might be run by more than one stage.

"A String",

],

},

},

},

},

],

}</pre>

</div>

<code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>

747

<pre>Retrieves the next page of results.

748

749

Args:

750

previous_request: The request for the previous page. (required)

751

previous_response: The response from the request for the previous page. (required)

752

753

Returns:

754

A request object that you can call 'execute()' on to request the next

755

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

760

<code class="details" id="create">create(projectId, body=None, location=None, x__xgafv=None, replaceJobId=None, view=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

761

<pre>Creates a Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

762

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

763

To create a job, we recommend using `projects.locations.jobs.create` with a

764

[regional endpoint]

765

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

766

`projects.jobs.create` is not recommended, as your job will always start

767

in `us-central1`.

768

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

769

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

770

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

771

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

772

The object takes the form of:

773

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

774

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

775

"labels": { # User-defined labels for this job.

776

#

777

# The labels map can contain no more than 64 entries. Entries of the labels

778

# map are UTF8 strings that comply with the following restrictions:

779

#

780

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

781

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

782

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

787

# by the metadata values provided here. Populated for ListJobs and all GetJob

788

# views SUMMARY and higher.

789

# ListJob response and Job SUMMARY view.

790

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

791

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

792

"version": "A String", # The version of the SDK used to run the job.

793

"sdkSupportStatus": "A String", # The support status for this SDK version.

794

},

795

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

796

{ # Metadata for a PubSub connector used by the job.

797

"topic": "A String", # Topic accessed in the connection.

798

"subscription": "A String", # Subscription used in the connection.

799

},

800

],

801

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

802

{ # Metadata for a Datastore connector used by the job.

803

"projectId": "A String", # ProjectId accessed in the connection.

804

"namespace": "A String", # Namespace used in the connection.

805

},

806

],

807

"fileDetails": [ # Identification of a File source used in the Dataflow job.

808

{ # Metadata for a File connector used by the job.

809

"filePattern": "A String", # File Pattern used to access files by the connector.

810

},

811

],

812

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

813

{ # Metadata for a Spanner connector used by the job.

814

"instanceId": "A String", # InstanceId accessed in the connection.

815

"projectId": "A String", # ProjectId accessed in the connection.

816

"databaseId": "A String", # DatabaseId accessed in the connection.

817

},

818

],

819

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

820

{ # Metadata for a BigTable connector used by the job.

821

"instanceId": "A String", # InstanceId accessed in the connection.

822

"projectId": "A String", # ProjectId accessed in the connection.

823

"tableId": "A String", # TableId accessed in the connection.

824

},

825

],

826

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

827

{ # Metadata for a BigQuery connector used by the job.

828

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

829

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

830

"table": "A String", # Table accessed in the connection.

831

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

836

# A description of the user pipeline and stages through which it is executed.

837

# Created by Cloud Dataflow service. Only retrieved with

838

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

839

# form. This data is provided by the Dataflow service for ease of visualizing

840

# the pipeline and interpreting Dataflow provided metrics.

841

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

842

{ # Description of the type, names/ids, and input/outputs for a transform.

843

"kind": "A String", # Type of transform.

844

"name": "A String", # User provided name for this transform instance.

845

"inputCollectionName": [ # User names for all collection inputs to this transform.

846

"A String",

847

],

848

"displayData": [ # Transform-specific display data.

849

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

850

"key": "A String", # The key identifying the display data.

851

# This is intended to be used as a label for the display data

852

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

853

"shortStrValue": "A String", # A possible additional shorter value to display.

854

# For example a java_class_name_value of com.mypackage.MyDoFn

855

# will be stored with MyDoFn as the short_str_value and

856

# com.mypackage.MyDoFn as the java_class_name value.

857

# short_str_value can be displayed and java_class_name_value

858

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

859

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

860

"url": "A String", # An optional full URL.

861

"floatValue": 3.14, # Contains value if the data is of float type.

862

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

863

# language namespace (i.e. python module) which defines the display data.

864

# This allows a dax monitoring system to specially handle the data

865

# and perform custom rendering.

866

"javaClassValue": "A String", # Contains value if the data is of java class type.

867

"label": "A String", # An optional label to display in a dax UI for the element.

868

"boolValue": True or False, # Contains value if the data is of a boolean type.

869

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

870

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

871

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

872

},

873

],

874

"outputCollectionName": [ # User names for all collection outputs to this transform.

875

"A String",

876

],

877

"id": "A String", # SDK generated id of this transform instance.

878

},

879

],

880

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

881

{ # Description of the composing transforms, names/ids, and input/outputs of a

882

# stage of execution. Some composing transforms and sources may have been

883

# generated by the Dataflow service during execution planning.

884

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

885

{ # Description of an interstitial value between transforms in an execution

886

# stage.

887

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

888

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

889

# source is most closely associated.

890

"name": "A String", # Dataflow service generated name for this source.

891

},

892

],

893

"kind": "A String", # Type of tranform this stage is executing.

894

"name": "A String", # Dataflow service generated name for this stage.

895

"outputSource": [ # Output sources for this stage.

896

{ # Description of an input or output of an execution stage.

897

"userName": "A String", # Human-readable name for this source; may be user or system generated.

898

"sizeBytes": "A String", # Size of the source, if measurable.

899

"name": "A String", # Dataflow service generated name for this source.

900

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

901

# source is most closely associated.

902

},

903

],

904

"inputSource": [ # Input sources for this stage.

905

{ # Description of an input or output of an execution stage.

906

"userName": "A String", # Human-readable name for this source; may be user or system generated.

907

"sizeBytes": "A String", # Size of the source, if measurable.

908

"name": "A String", # Dataflow service generated name for this source.

909

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

910

# source is most closely associated.

911

},

912

],

913

"componentTransform": [ # Transforms that comprise this execution stage.

914

{ # Description of a transform executed as part of an execution stage.

915

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

916

"originalTransform": "A String", # User name for the original user transform with which this transform is

917

# most closely associated.

918

"name": "A String", # Dataflow service generated name for this source.

919

},

920

],

921

"id": "A String", # Dataflow service generated id for this stage.

922

},

923

],

924

"displayData": [ # Pipeline level display data.

925

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

926

"key": "A String", # The key identifying the display data.

927

# This is intended to be used as a label for the display data

928

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

929

"shortStrValue": "A String", # A possible additional shorter value to display.

930

# For example a java_class_name_value of com.mypackage.MyDoFn

931

# will be stored with MyDoFn as the short_str_value and

932

# com.mypackage.MyDoFn as the java_class_name value.

933

# short_str_value can be displayed and java_class_name_value

934

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

935

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

936

"url": "A String", # An optional full URL.

937

"floatValue": 3.14, # Contains value if the data is of float type.

938

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

939

# language namespace (i.e. python module) which defines the display data.

940

# This allows a dax monitoring system to specially handle the data

941

# and perform custom rendering.

942

"javaClassValue": "A String", # Contains value if the data is of java class type.

943

"label": "A String", # An optional label to display in a dax UI for the element.

944

"boolValue": True or False, # Contains value if the data is of a boolean type.

945

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

946

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

947

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

952

# callers cannot mutate it.

953

{ # A message describing the state of a particular execution stage.

954

"executionStageName": "A String", # The name of the execution stage.

955

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

956

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

957

},

958

],

959

"id": "A String", # The unique ID of this job.

960

#

961

# This field is set by the Cloud Dataflow service when the Job is

962

# created, and is immutable for the life of the job.

963

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

964

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

965

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

966

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

967

# corresponding name prefixes of the new job.

968

"a_key": "A String",

969

},

970

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

971

"workerRegion": "A String", # The Compute Engine region

972

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

973

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

974

# with worker_zone. If neither worker_region nor worker_zone is specified,

975

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

976

"version": { # A structure describing which components and their versions of the service

977

# are required in order to run the job.

978

"a_key": "", # Properties of the object.

979

},

980

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

981

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

982

# at rest, AKA a Customer Managed Encryption Key (CMEK).

983

#

984

# Format:

985

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

986

"internalExperiments": { # Experimental settings.

987

"a_key": "", # Properties of the object. Contains field @type with type URL.

988

},

989

"dataset": "A String", # The dataset for the current project where various workflow

990

# related tables are stored.

991

#

992

# The supported resource type is:

993

#

994

# Google BigQuery:

995

# bigquery.googleapis.com/{dataset}

996

"experiments": [ # The list of experiments to enable.

997

"A String",

998

],

999

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

1000

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1001

# options are passed through the service and are used to recreate the

1002

# SDK pipeline options on the worker in a language agnostic and platform

1003

# independent way.

1004

"a_key": "", # Properties of the object.

1005

},

1006

"userAgent": { # A description of the process that generated the request.

1007

"a_key": "", # Properties of the object.

1008

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1009

"workerZone": "A String", # The Compute Engine zone

1010

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1011

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1012

# with worker_region. If neither worker_region nor worker_zone is specified,

1013

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1014

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1015

# specified in order for the job to have workers.

1016

{ # Describes one particular pool of Cloud Dataflow workers to be

1017

# instantiated by the Cloud Dataflow service in order to perform the

1018

# computations required by a job. Note that a workflow job may use

1019

# multiple pools, in order to match the various computational

1020

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1021

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1022

# harness, residing in Google Container Registry.

1023

#

1024

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1025

"ipConfiguration": "A String", # Configuration for VM IPs.

1026

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1027

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1028

"algorithm": "A String", # The algorithm to use for autoscaling.

1029

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1030

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1031

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1032

# the service will use the network "default".

1033

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1034

# will attempt to choose a reasonable default.

1035

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1036

"a_key": "A String",

1037

},

1038

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1039

# service will attempt to choose a reasonable default.

1040

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1041

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1042

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1043

# using the standard Dataflow task runner. Users should ignore

1044

# this field.

1045

"workflowFileName": "A String", # The file to store the workflow in.

1046

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1047

# will not be uploaded.

1048

#

1049

# The supported resource type is:

1050

#

1051

# Google Cloud Storage:

1052

# storage.googleapis.com/{bucket}/{object}

1053

# bucket.storage.googleapis.com/{object}

1054

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1055

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1056

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1057

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1058

"vmId": "A String", # The ID string of the VM.

1059

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1060

# taskrunner; e.g. "wheel".

1061

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1062

# taskrunner; e.g. "root".

1063

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1064

# access the Cloud Dataflow API.

1065

"A String",

1066

],

1067

"languageHint": "A String", # The suggested backend language.

1068

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1069

# console.

1070

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1071

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1072

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1073

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1074

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1075

# "shuffle/v1beta1".

1076

"workerId": "A String", # The ID of the worker running this pipeline.

1077

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1078

#

1079

# When workers access Google Cloud APIs, they logically do so via

1080

# relative URLs. If this field is specified, it supplies the base

1081

# URL to use for resolving these relative URLs. The normative

1082

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1083

# Locators".

1084

#

1085

# If not specified, the default value is "http://www.googleapis.com/"

1086

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1087

# "dataflow/v1b3/projects".

1088

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1089

# storage.

1090

#

1091

# The supported resource type is:

1092

#

1093

# Google Cloud Storage:

1094

#

1095

# storage.googleapis.com/{bucket}/{object}

1096

# bucket.storage.googleapis.com/{object}

1097

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1098

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1099

"harnessCommand": "A String", # The command to launch the worker harness.

1100

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1101

# temporary storage.

1102

#

1103

# The supported resource type is:

1104

#

1105

# Google Cloud Storage:

1106

# storage.googleapis.com/{bucket}/{object}

1107

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1108

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1109

#

1110

# When workers access Google Cloud APIs, they logically do so via

1111

# relative URLs. If this field is specified, it supplies the base

1112

# URL to use for resolving these relative URLs. The normative

1113

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1114

# Locators".

1115

#

1116

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1117

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1118

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1119

# service will choose a number of threads (according to the number of cores

1120

# on the selected machine type for batch, or 1 by convention for streaming).

1121

"poolArgs": { # Extra arguments for this worker pool.

1122

"a_key": "", # Properties of the object. Contains field @type with type URL.

1123

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1124

"packages": [ # Packages to be installed on workers.

1125

{ # The packages that must be installed in order for a worker to run the

1126

# steps of the Cloud Dataflow job that will be assigned to its worker

1127

# pool.

1128

#

1129

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1130

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1131

# might use this to install jars containing the user's code and all of the

1132

# various dependencies (libraries, data files, etc.) required in order

1133

# for that code to run.

1134

"location": "A String", # The resource to read the package from. The supported resource type is:

1135

#

1136

# Google Cloud Storage:

1137

#

1138

# storage.googleapis.com/{bucket}

1139

# bucket.storage.googleapis.com/

1140

"name": "A String", # The name of the package.

1141

},

1142

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1143

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1144

# select a default set of packages which are useful to worker

1145

# harnesses written in a particular language.

1146

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1147

# are supported.

1148

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1149

# attempt to choose a reasonable default.

1150

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1151

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1152

# `TEARDOWN_NEVER`.

1153

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1154

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1155

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1156

# down.

1157

#

1158

# If the workers are not torn down by the service, they will

1159

# continue to run and use Google Compute Engine VM resources in the

1160

# user's project until they are explicitly terminated by the user.

1161

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1162

# policy except for small, manually supervised test jobs.

1163

#

1164

# If unknown or unspecified, the service will attempt to choose a reasonable

1165

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1166

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1167

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1168

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1169

# execute the job. If zero or unspecified, the service will

1170

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1171

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1172

# the form "regions/REGION/subnetworks/SUBNETWORK".

1173

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1174

{ # Describes the data disk used by a workflow job.

1175

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1176

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1177

# attempt to choose a reasonable default.

1178

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1179

# must be a disk type appropriate to the project and zone in which

1180

# the workers will run. If unknown or unspecified, the service

1181

# will attempt to choose a reasonable default.

1182

#

1183

# For example, the standard persistent disk type is a resource name

1184

# typically ending in "pd-standard". If SSD persistent disks are

1185

# available, the resource name typically ends with "pd-ssd". The

1186

# actual valid values are defined the Google Compute Engine API,

1187

# not by the Cloud Dataflow API; consult the Google Compute Engine

1188

# documentation for more information about determining the set of

1189

# available disk types for a particular project and zone.

1190

#

1191

# Google Compute Engine Disk types are local to a particular

1192

# project in a particular zone, and so the resource name will

1193

# typically look something like this:

1194

#

1195

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1196

},

1197

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1198

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1199

# only be set in the Fn API path. For non-cross-language pipelines this

1200

# should have only one entry. Cross-language pipelines will have two or more

1201

# entries.

1202

{ # Defines a SDK harness container for executing Dataflow pipelines.

1203

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1204

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1205

# container instance with this image. If false (or unset) recommends using

1206

# more than one core per SDK container instance with this image for

1207

# efficiency. Note that Dataflow service may choose to override this property

1208

# if needed.

1209

},

1210

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1211

},

1212

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1213

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1214

# unspecified, the service will attempt to choose a reasonable

1215

# default. This should be in the form of the API service name,

1216

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1217

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1218

# storage. The system will append the suffix "/temp-{JOBNAME} to

1219

# this resource prefix, where {JOBNAME} is the value of the

1220

# job_name field. The resulting bucket and object prefix is used

1221

# as the prefix of the resources used to store temporary data

1222

# needed during the job execution. NOTE: This will override the

1223

# value in taskrunner_settings.

1224

# The supported resource type is:

1225

#

1226

# Google Cloud Storage:

1227

#

1228

# storage.googleapis.com/{bucket}/{object}

1229

# bucket.storage.googleapis.com/{object}

1230

},

1231

"location": "A String", # The [regional endpoint]

1232

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1233

# contains this job.

1234

"tempFiles": [ # A set of files the system should be aware of that are used

1235

# for temporary storage. These temporary files will be

1236

# removed on job completion.

1237

# No duplicates are allowed.

1238

# No file patterns are supported.

1239

#

1240

# The supported files are:

1241

#

1242

# Google Cloud Storage:

1243

#

1244

# storage.googleapis.com/{bucket}/{object}

1245

# bucket.storage.googleapis.com/{object}

1246

"A String",

1247

],

1248

"type": "A String", # The type of Cloud Dataflow job.

1249

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1250

# If this field is set, the service will ensure its uniqueness.

1251

# The request to create a job will fail if the service has knowledge of a

1252

# previously submitted job with the same client's ID and job name.

1253

# The caller may use this field to ensure idempotence of job

1254

# creation across retried attempts to create a job.

1255

# By default, the field is empty and, in that case, the service ignores it.

1256

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1257

# snapshot.

1258

"stepsLocation": "A String", # The GCS location where the steps are stored.

1259

"currentStateTime": "A String", # The timestamp associated with the current state.

1260

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1261

# Flexible resource scheduling jobs are started with some delay after job

1262

# creation, so start_time is unset before start and is updated when the

1263

# job is started by the Cloud Dataflow service. For other jobs, start_time

1264

# always equals to create_time and is immutable and set by the Cloud Dataflow

1265

# service.

1266

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1267

# Cloud Dataflow service.

1268

"requestedState": "A String", # The job's requested state.

1269

#

1270

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1271

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1272

# also be used to directly set a job's requested state to

1273

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1274

# job if it has not already reached a terminal state.

1275

"name": "A String", # The user-specified Cloud Dataflow job name.

1276

#

1277

# Only one Job with a given name may exist in a project at any

1278

# given time. If a caller attempts to create a Job with the same

1279

# name as an already-existing Job, the attempt returns the

1280

# existing Job.

1281

#

1282

# The name must match the regular expression

1283

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1284

"steps": [ # Exactly one of step or steps_location should be specified.

1285

#

1286

# The top-level steps that constitute the entire job.

1287

{ # Defines a particular step within a Cloud Dataflow job.

1288

#

1289

# A job consists of multiple steps, each of which performs some

1290

# specific operation as part of the overall job. Data is typically

1291

# passed from one step to another as part of the job.

1292

#

1293

# Here's an example of a sequence of steps which together implement a

1294

# Map-Reduce job:

1295

#

1296

# * Read a collection of data from some source, parsing the

1297

# collection's elements.

1298

#

1299

# * Validate the elements.

1300

#

1301

# * Apply a user-defined function to map each element to some value

1302

# and extract an element-specific key value.

1303

#

1304

# * Group elements with the same key into a single element with

1305

# that key, transforming a multiply-keyed collection into a

1306

# uniquely-keyed collection.

1307

#

1308

# * Write the elements out to some data sink.

1309

#

1310

# Note that the Cloud Dataflow service may be used to run many different

1311

# types of jobs, not just Map-Reduce.

1312

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1313

"name": "A String", # The name that identifies the step. This must be unique for each

1314

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1315

"properties": { # Named properties associated with the step. Each kind of

1316

# predefined step has its own required set of properties.

1317

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1318

"a_key": "", # Properties of the object.

1319

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1320

},

1321

],

1322

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1323

# of the job it replaced.

1324

#

1325

# When sending a `CreateJobRequest`, you can update a job by specifying it

1326

# here. The job named here is stopped, and its intermediate state is

1327

# transferred to this job.

1328

"currentState": "A String", # The current state of the job.

1329

#

1330

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1331

# specified.

1332

#

1333

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1334

# terminal state. After a job has reached a terminal state, no

1335

# further state updates may be made.

1336

#

1337

# This field may be mutated by the Cloud Dataflow service;

1338

# callers cannot mutate it.

1339

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1340

# isn't contained in the submitted job.

1341

"stages": { # A mapping from each stage to the information about that stage.

1342

"a_key": { # Contains information about how a particular

1343

# google.dataflow.v1beta3.Step will be executed.

1344

"stepName": [ # The steps associated with the execution stage.

1345

# Note that stages may have several steps, and that a given step

1346

# might be run by more than one stage.

"A String",

],

},

},

},

}

location: string, The [regional endpoint]

1355

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1356

contains this job.

1357

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

replaceJobId: string, Deprecated. This field is now in the Job message.

1362

view: string, The level of information requested in response.

1363

1364

Returns:

1365

An object of the form:

1366

1367

{ # Defines a job to be run by the Cloud Dataflow service.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1368

"labels": { # User-defined labels for this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1369

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1370

# The labels map can contain no more than 64 entries. Entries of the labels

1371

# map are UTF8 strings that comply with the following restrictions:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1372

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1373

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1374

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1375

# * Both keys and values are additionally constrained to be <= 128 bytes in

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1376

# size.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

1377

"a_key": "A String",

1378

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1379

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1380

# by the metadata values provided here. Populated for ListJobs and all GetJob

1381

# views SUMMARY and higher.

1382

# ListJob response and Job SUMMARY view.

1383

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1384

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1385

"version": "A String", # The version of the SDK used to run the job.

1386

"sdkSupportStatus": "A String", # The support status for this SDK version.

1387

},

1388

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1389

{ # Metadata for a PubSub connector used by the job.

1390

"topic": "A String", # Topic accessed in the connection.

1391

"subscription": "A String", # Subscription used in the connection.

1392

},

1393

],

1394

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1395

{ # Metadata for a Datastore connector used by the job.

1396

"projectId": "A String", # ProjectId accessed in the connection.

1397

"namespace": "A String", # Namespace used in the connection.

1398

},

1399

],

1400

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1401

{ # Metadata for a File connector used by the job.

1402

"filePattern": "A String", # File Pattern used to access files by the connector.

1403

},

1404

],

1405

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1406

{ # Metadata for a Spanner connector used by the job.

1407

"instanceId": "A String", # InstanceId accessed in the connection.

1408

"projectId": "A String", # ProjectId accessed in the connection.

1409

"databaseId": "A String", # DatabaseId accessed in the connection.

1410

},

1411

],

1412

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1413

{ # Metadata for a BigTable connector used by the job.

1414

"instanceId": "A String", # InstanceId accessed in the connection.

1415

"projectId": "A String", # ProjectId accessed in the connection.

1416

"tableId": "A String", # TableId accessed in the connection.

1417

},

1418

],

1419

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1420

{ # Metadata for a BigQuery connector used by the job.

1421

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1422

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1423

"table": "A String", # Table accessed in the connection.

1424

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1429

# A description of the user pipeline and stages through which it is executed.

1430

# Created by Cloud Dataflow service. Only retrieved with

1431

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1432

# form. This data is provided by the Dataflow service for ease of visualizing

1433

# the pipeline and interpreting Dataflow provided metrics.

1434

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1435

{ # Description of the type, names/ids, and input/outputs for a transform.

1436

"kind": "A String", # Type of transform.

1437

"name": "A String", # User provided name for this transform instance.

1438

"inputCollectionName": [ # User names for all collection inputs to this transform.

1439

"A String",

1440

],

1441

"displayData": [ # Transform-specific display data.

1442

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1443

"key": "A String", # The key identifying the display data.

1444

# This is intended to be used as a label for the display data

1445

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1446

"shortStrValue": "A String", # A possible additional shorter value to display.

1447

# For example a java_class_name_value of com.mypackage.MyDoFn

1448

# will be stored with MyDoFn as the short_str_value and

1449

# com.mypackage.MyDoFn as the java_class_name value.

1450

# short_str_value can be displayed and java_class_name_value

1451

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1452

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1453

"url": "A String", # An optional full URL.

1454

"floatValue": 3.14, # Contains value if the data is of float type.

1455

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1456

# language namespace (i.e. python module) which defines the display data.

1457

# This allows a dax monitoring system to specially handle the data

1458

# and perform custom rendering.

1459

"javaClassValue": "A String", # Contains value if the data is of java class type.

1460

"label": "A String", # An optional label to display in a dax UI for the element.

1461

"boolValue": True or False, # Contains value if the data is of a boolean type.

1462

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1463

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1464

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1465

},

1466

],

1467

"outputCollectionName": [ # User names for all collection outputs to this transform.

1468

"A String",

1469

],

1470

"id": "A String", # SDK generated id of this transform instance.

1471

},

1472

],

1473

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1474

{ # Description of the composing transforms, names/ids, and input/outputs of a

1475

# stage of execution. Some composing transforms and sources may have been

1476

# generated by the Dataflow service during execution planning.

1477

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1478

{ # Description of an interstitial value between transforms in an execution

1479

# stage.

1480

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1481

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1482

# source is most closely associated.

1483

"name": "A String", # Dataflow service generated name for this source.

1484

},

1485

],

1486

"kind": "A String", # Type of tranform this stage is executing.

1487

"name": "A String", # Dataflow service generated name for this stage.

1488

"outputSource": [ # Output sources for this stage.

1489

{ # Description of an input or output of an execution stage.

1490

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1491

"sizeBytes": "A String", # Size of the source, if measurable.

1492

"name": "A String", # Dataflow service generated name for this source.

1493

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1494

# source is most closely associated.

1495

},

1496

],

1497

"inputSource": [ # Input sources for this stage.

1498

{ # Description of an input or output of an execution stage.

1499

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1500

"sizeBytes": "A String", # Size of the source, if measurable.

1501

"name": "A String", # Dataflow service generated name for this source.

1502

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1503

# source is most closely associated.

1504

},

1505

],

1506

"componentTransform": [ # Transforms that comprise this execution stage.

1507

{ # Description of a transform executed as part of an execution stage.

1508

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1509

"originalTransform": "A String", # User name for the original user transform with which this transform is

1510

# most closely associated.

1511

"name": "A String", # Dataflow service generated name for this source.

1512

},

1513

],

1514

"id": "A String", # Dataflow service generated id for this stage.

1515

},

1516

],

1517

"displayData": [ # Pipeline level display data.

1518

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1519

"key": "A String", # The key identifying the display data.

1520

# This is intended to be used as a label for the display data

1521

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1522

"shortStrValue": "A String", # A possible additional shorter value to display.

1523

# For example a java_class_name_value of com.mypackage.MyDoFn

1524

# will be stored with MyDoFn as the short_str_value and

1525

# com.mypackage.MyDoFn as the java_class_name value.

1526

# short_str_value can be displayed and java_class_name_value

1527

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1528

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1529

"url": "A String", # An optional full URL.

1530

"floatValue": 3.14, # Contains value if the data is of float type.

1531

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1532

# language namespace (i.e. python module) which defines the display data.

1533

# This allows a dax monitoring system to specially handle the data

1534

# and perform custom rendering.

1535

"javaClassValue": "A String", # Contains value if the data is of java class type.

1536

"label": "A String", # An optional label to display in a dax UI for the element.

1537

"boolValue": True or False, # Contains value if the data is of a boolean type.

1538

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1539

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1540

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1545

# callers cannot mutate it.

1546

{ # A message describing the state of a particular execution stage.

1547

"executionStageName": "A String", # The name of the execution stage.

1548

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1549

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1550

},

1551

],

1552

"id": "A String", # The unique ID of this job.

1553

#

1554

# This field is set by the Cloud Dataflow service when the Job is

1555

# created, and is immutable for the life of the job.

1556

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1557

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1558

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1559

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

1560

# corresponding name prefixes of the new job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1561

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

1562

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1563

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1564

"workerRegion": "A String", # The Compute Engine region

1565

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1566

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1567

# with worker_zone. If neither worker_region nor worker_zone is specified,

1568

# default to the control plane's region.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1569

"version": { # A structure describing which components and their versions of the service

1570

# are required in order to run the job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1571

"a_key": "", # Properties of the object.

1572

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1573

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1574

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1575

# at rest, AKA a Customer Managed Encryption Key (CMEK).

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1576

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1577

# Format:

1578

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1579

"internalExperiments": { # Experimental settings.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

1580

"a_key": "", # Properties of the object. Contains field @type with type URL.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1581

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1582

"dataset": "A String", # The dataset for the current project where various workflow

1583

# related tables are stored.

1584

#

1585

# The supported resource type is:

1586

#

1587

# Google BigQuery:

1588

# bigquery.googleapis.com/{dataset}

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1589

"experiments": [ # The list of experiments to enable.

1590

"A String",

1591

],

Sai Cheemalapati

ea3a5e1

2016-10-12 14:05:53 -0700

[diff] [blame]

1592

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1593

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1594

# options are passed through the service and are used to recreate the

1595

# SDK pipeline options on the worker in a language agnostic and platform

1596

# independent way.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1597

"a_key": "", # Properties of the object.

1598

},

1599

"userAgent": { # A description of the process that generated the request.

1600

"a_key": "", # Properties of the object.

1601

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1602

"workerZone": "A String", # The Compute Engine zone

1603

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1604

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1605

# with worker_region. If neither worker_region nor worker_zone is specified,

1606

# a zone in the control plane's region is chosen based on available capacity.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1607

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1608

# specified in order for the job to have workers.

1609

{ # Describes one particular pool of Cloud Dataflow workers to be

1610

# instantiated by the Cloud Dataflow service in order to perform the

1611

# computations required by a job. Note that a workflow job may use

1612

# multiple pools, in order to match the various computational

1613

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1614

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1615

# harness, residing in Google Container Registry.

1616

#

1617

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1618

"ipConfiguration": "A String", # Configuration for VM IPs.

1619

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1620

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1621

"algorithm": "A String", # The algorithm to use for autoscaling.

1622

},

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1623

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1624

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1625

# the service will use the network "default".

1626

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1627

# will attempt to choose a reasonable default.

1628

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1629

"a_key": "A String",

1630

},

1631

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1632

# service will attempt to choose a reasonable default.

1633

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1634

# Compute Engine API.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1635

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1636

# using the standard Dataflow task runner. Users should ignore

1637

# this field.

1638

"workflowFileName": "A String", # The file to store the workflow in.

1639

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1640

# will not be uploaded.

1641

#

1642

# The supported resource type is:

1643

#

1644

# Google Cloud Storage:

1645

# storage.googleapis.com/{bucket}/{object}

1646

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

1647

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1648

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1649

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1650

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1651

"vmId": "A String", # The ID string of the VM.

1652

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1653

# taskrunner; e.g. "wheel".

1654

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1655

# taskrunner; e.g. "root".

1656

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1657

# access the Cloud Dataflow API.

1658

"A String",

1659

],

1660

"languageHint": "A String", # The suggested backend language.

1661

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1662

# console.

1663

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1664

"logDir": "A String", # The directory on the VM to store logs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1665

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1666

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1667

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1668

# "shuffle/v1beta1".

1669

"workerId": "A String", # The ID of the worker running this pipeline.

1670

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1671

#

1672

# When workers access Google Cloud APIs, they logically do so via

1673

# relative URLs. If this field is specified, it supplies the base

1674

# URL to use for resolving these relative URLs. The normative

1675

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1676

# Locators".

1677

#

1678

# If not specified, the default value is "http://www.googleapis.com/"

1679

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1680

# "dataflow/v1b3/projects".

1681

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1682

# storage.

1683

#

1684

# The supported resource type is:

1685

#

1686

# Google Cloud Storage:

1687

#

1688

# storage.googleapis.com/{bucket}/{object}

1689

# bucket.storage.googleapis.com/{object}

1690

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1691

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

1692

"harnessCommand": "A String", # The command to launch the worker harness.

1693

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1694

# temporary storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1695

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

1696

# The supported resource type is:

1697

#

1698

# Google Cloud Storage:

1699

# storage.googleapis.com/{bucket}/{object}

1700

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1701

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1702

#

1703

# When workers access Google Cloud APIs, they logically do so via

1704

# relative URLs. If this field is specified, it supplies the base

1705

# URL to use for resolving these relative URLs. The normative

1706

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1707

# Locators".

1708

#

1709

# If not specified, the default value is "http://www.googleapis.com/"

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1710

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1711

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1712

# service will choose a number of threads (according to the number of cores

1713

# on the selected machine type for batch, or 1 by convention for streaming).

1714

"poolArgs": { # Extra arguments for this worker pool.

1715

"a_key": "", # Properties of the object. Contains field @type with type URL.

1716

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1717

"packages": [ # Packages to be installed on workers.

1718

{ # The packages that must be installed in order for a worker to run the

1719

# steps of the Cloud Dataflow job that will be assigned to its worker

1720

# pool.

1721

#

1722

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1723

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1724

# might use this to install jars containing the user's code and all of the

1725

# various dependencies (libraries, data files, etc.) required in order

1726

# for that code to run.

1727

"location": "A String", # The resource to read the package from. The supported resource type is:

1728

#

1729

# Google Cloud Storage:

1730

#

1731

# storage.googleapis.com/{bucket}

1732

# bucket.storage.googleapis.com/

1733

"name": "A String", # The name of the package.

1734

},

1735

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1736

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1737

# select a default set of packages which are useful to worker

1738

# harnesses written in a particular language.

1739

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1740

# are supported.

1741

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1742

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1743

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1744

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1745

# `TEARDOWN_NEVER`.

1746

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1747

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1748

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1749

# down.

1750

#

1751

# If the workers are not torn down by the service, they will

1752

# continue to run and use Google Compute Engine VM resources in the

1753

# user's project until they are explicitly terminated by the user.

1754

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1755

# policy except for small, manually supervised test jobs.

1756

#

1757

# If unknown or unspecified, the service will attempt to choose a reasonable

1758

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1759

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1760

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1761

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1762

# execute the job. If zero or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1763

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1764

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1765

# the form "regions/REGION/subnetworks/SUBNETWORK".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1766

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1767

{ # Describes the data disk used by a workflow job.

1768

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1769

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1770

# attempt to choose a reasonable default.

1771

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1772

# must be a disk type appropriate to the project and zone in which

1773

# the workers will run. If unknown or unspecified, the service

1774

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1775

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1776

# For example, the standard persistent disk type is a resource name

1777

# typically ending in "pd-standard". If SSD persistent disks are

1778

# available, the resource name typically ends with "pd-ssd". The

1779

# actual valid values are defined the Google Compute Engine API,

1780

# not by the Cloud Dataflow API; consult the Google Compute Engine

1781

# documentation for more information about determining the set of

1782

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1783

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1784

# Google Compute Engine Disk types are local to a particular

1785

# project in a particular zone, and so the resource name will

1786

# typically look something like this:

1787

#

1788

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1789

},

1790

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1791

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1792

# only be set in the Fn API path. For non-cross-language pipelines this

1793

# should have only one entry. Cross-language pipelines will have two or more

1794

# entries.

1795

{ # Defines a SDK harness container for executing Dataflow pipelines.

1796

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1797

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1798

# container instance with this image. If false (or unset) recommends using

1799

# more than one core per SDK container instance with this image for

1800

# efficiency. Note that Dataflow service may choose to override this property

1801

# if needed.

1802

},

1803

],

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1804

},

1805

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1806

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1807

# unspecified, the service will attempt to choose a reasonable

1808

# default. This should be in the form of the API service name,

1809

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1810

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1811

# storage. The system will append the suffix "/temp-{JOBNAME} to

1812

# this resource prefix, where {JOBNAME} is the value of the

1813

# job_name field. The resulting bucket and object prefix is used

1814

# as the prefix of the resources used to store temporary data

1815

# needed during the job execution. NOTE: This will override the

1816

# value in taskrunner_settings.

1817

# The supported resource type is:

1818

#

1819

# Google Cloud Storage:

1820

#

1821

# storage.googleapis.com/{bucket}/{object}

1822

# bucket.storage.googleapis.com/{object}

1823

},

1824

"location": "A String", # The [regional endpoint]

1825

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1826

# contains this job.

1827

"tempFiles": [ # A set of files the system should be aware of that are used

1828

# for temporary storage. These temporary files will be

1829

# removed on job completion.

1830

# No duplicates are allowed.

1831

# No file patterns are supported.

1832

#

1833

# The supported files are:

1834

#

1835

# Google Cloud Storage:

1836

#

1837

# storage.googleapis.com/{bucket}/{object}

1838

# bucket.storage.googleapis.com/{object}

1839

"A String",

1840

],

1841

"type": "A String", # The type of Cloud Dataflow job.

1842

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1843

# If this field is set, the service will ensure its uniqueness.

1844

# The request to create a job will fail if the service has knowledge of a

1845

# previously submitted job with the same client's ID and job name.

1846

# The caller may use this field to ensure idempotence of job

1847

# creation across retried attempts to create a job.

1848

# By default, the field is empty and, in that case, the service ignores it.

1849

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1850

# snapshot.

1851

"stepsLocation": "A String", # The GCS location where the steps are stored.

1852

"currentStateTime": "A String", # The timestamp associated with the current state.

1853

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1854

# Flexible resource scheduling jobs are started with some delay after job

1855

# creation, so start_time is unset before start and is updated when the

1856

# job is started by the Cloud Dataflow service. For other jobs, start_time

1857

# always equals to create_time and is immutable and set by the Cloud Dataflow

1858

# service.

1859

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1860

# Cloud Dataflow service.

1861

"requestedState": "A String", # The job's requested state.

1862

#

1863

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1864

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1865

# also be used to directly set a job's requested state to

1866

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1867

# job if it has not already reached a terminal state.

1868

"name": "A String", # The user-specified Cloud Dataflow job name.

1869

#

1870

# Only one Job with a given name may exist in a project at any

1871

# given time. If a caller attempts to create a Job with the same

1872

# name as an already-existing Job, the attempt returns the

1873

# existing Job.

1874

#

1875

# The name must match the regular expression

1876

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1877

"steps": [ # Exactly one of step or steps_location should be specified.

1878

#

1879

# The top-level steps that constitute the entire job.

1880

{ # Defines a particular step within a Cloud Dataflow job.

1881

#

1882

# A job consists of multiple steps, each of which performs some

1883

# specific operation as part of the overall job. Data is typically

1884

# passed from one step to another as part of the job.

1885

#

1886

# Here's an example of a sequence of steps which together implement a

1887

# Map-Reduce job:

1888

#

1889

# * Read a collection of data from some source, parsing the

1890

# collection's elements.

1891

#

1892

# * Validate the elements.

1893

#

1894

# * Apply a user-defined function to map each element to some value

1895

# and extract an element-specific key value.

1896

#

1897

# * Group elements with the same key into a single element with

1898

# that key, transforming a multiply-keyed collection into a

1899

# uniquely-keyed collection.

1900

#

1901

# * Write the elements out to some data sink.

1902

#

1903

# Note that the Cloud Dataflow service may be used to run many different

1904

# types of jobs, not just Map-Reduce.

1905

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1906

"name": "A String", # The name that identifies the step. This must be unique for each

1907

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1908

"properties": { # Named properties associated with the step. Each kind of

1909

# predefined step has its own required set of properties.

1910

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1911

"a_key": "", # Properties of the object.

1912

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1913

},

1914

],

1915

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1916

# of the job it replaced.

1917

#

1918

# When sending a `CreateJobRequest`, you can update a job by specifying it

1919

# here. The job named here is stopped, and its intermediate state is

1920

# transferred to this job.

1921

"currentState": "A String", # The current state of the job.

1922

#

1923

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1924

# specified.

1925

#

1926

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1927

# terminal state. After a job has reached a terminal state, no

1928

# further state updates may be made.

1929

#

1930

# This field may be mutated by the Cloud Dataflow service;

1931

# callers cannot mutate it.

1932

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1933

# isn't contained in the submitted job.

1934

"stages": { # A mapping from each stage to the information about that stage.

1935

"a_key": { # Contains information about how a particular

1936

# google.dataflow.v1beta3.Step will be executed.

1937

"stepName": [ # The steps associated with the execution stage.

1938

# Note that stages may have several steps, and that a given step

1939

# might be run by more than one stage.

"A String",

],

},

},

},

}</pre>

</div>

<code class="details" id="get">get(projectId, jobId, location=None, x__xgafv=None, view=None)</code>

1950

<pre>Gets the state of the specified Cloud Dataflow job.

1951

1952

To get the state of a job, we recommend using `projects.locations.jobs.get`

1953

with a [regional endpoint]

1954

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1955

`projects.jobs.get` is not recommended, as you can only get the state of

1956

jobs that are running in `us-central1`.

1957

1958

Args:

1959

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1960

jobId: string, The job ID. (required)

1961

location: string, The [regional endpoint]

1962

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1963

contains this job.

1964

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

view: string, The level of information requested in response.

1969

1970

Returns:

1971

An object of the form:

1972

1973

{ # Defines a job to be run by the Cloud Dataflow service.

1974

"labels": { # User-defined labels for this job.

1975

#

1976

# The labels map can contain no more than 64 entries. Entries of the labels

1977

# map are UTF8 strings that comply with the following restrictions:

1978

#

1979

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1980

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

1981

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1986

# by the metadata values provided here. Populated for ListJobs and all GetJob

1987

# views SUMMARY and higher.

1988

# ListJob response and Job SUMMARY view.

1989

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1990

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1991

"version": "A String", # The version of the SDK used to run the job.

1992

"sdkSupportStatus": "A String", # The support status for this SDK version.

1993

},

1994

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1995

{ # Metadata for a PubSub connector used by the job.

1996

"topic": "A String", # Topic accessed in the connection.

1997

"subscription": "A String", # Subscription used in the connection.

1998

},

1999

],

2000

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2001

{ # Metadata for a Datastore connector used by the job.

2002

"projectId": "A String", # ProjectId accessed in the connection.

2003

"namespace": "A String", # Namespace used in the connection.

2004

},

2005

],

2006

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2007

{ # Metadata for a File connector used by the job.

2008

"filePattern": "A String", # File Pattern used to access files by the connector.

2009

},

2010

],

2011

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2012

{ # Metadata for a Spanner connector used by the job.

2013

"instanceId": "A String", # InstanceId accessed in the connection.

2014

"projectId": "A String", # ProjectId accessed in the connection.

2015

"databaseId": "A String", # DatabaseId accessed in the connection.

2016

},

2017

],

2018

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2019

{ # Metadata for a BigTable connector used by the job.

2020

"instanceId": "A String", # InstanceId accessed in the connection.

2021

"projectId": "A String", # ProjectId accessed in the connection.

2022

"tableId": "A String", # TableId accessed in the connection.

2023

},

2024

],

2025

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2026

{ # Metadata for a BigQuery connector used by the job.

2027

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2028

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2029

"table": "A String", # Table accessed in the connection.

2030

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2031

},

2032

],

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2033

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2034

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2035

# A description of the user pipeline and stages through which it is executed.

2036

# Created by Cloud Dataflow service. Only retrieved with

2037

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2038

# form. This data is provided by the Dataflow service for ease of visualizing

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2039

# the pipeline and interpreting Dataflow provided metrics.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2040

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2041

{ # Description of the type, names/ids, and input/outputs for a transform.

2042

"kind": "A String", # Type of transform.

2043

"name": "A String", # User provided name for this transform instance.

2044

"inputCollectionName": [ # User names for all collection inputs to this transform.

2045

"A String",

2046

],

2047

"displayData": [ # Transform-specific display data.

2048

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2049

"key": "A String", # The key identifying the display data.

2050

# This is intended to be used as a label for the display data

2051

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2052

"shortStrValue": "A String", # A possible additional shorter value to display.

2053

# For example a java_class_name_value of com.mypackage.MyDoFn

2054

# will be stored with MyDoFn as the short_str_value and

2055

# com.mypackage.MyDoFn as the java_class_name value.

2056

# short_str_value can be displayed and java_class_name_value

2057

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2058

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2059

"url": "A String", # An optional full URL.

2060

"floatValue": 3.14, # Contains value if the data is of float type.

2061

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2062

# language namespace (i.e. python module) which defines the display data.

2063

# This allows a dax monitoring system to specially handle the data

2064

# and perform custom rendering.

2065

"javaClassValue": "A String", # Contains value if the data is of java class type.

2066

"label": "A String", # An optional label to display in a dax UI for the element.

2067

"boolValue": True or False, # Contains value if the data is of a boolean type.

2068

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2069

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2070

"int64Value": "A String", # Contains value if the data is of int64 type.

2071

},

2072

],

2073

"outputCollectionName": [ # User names for all collection outputs to this transform.

2074

"A String",

2075

],

2076

"id": "A String", # SDK generated id of this transform instance.

2077

},

2078

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2079

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2080

{ # Description of the composing transforms, names/ids, and input/outputs of a

2081

# stage of execution. Some composing transforms and sources may have been

2082

# generated by the Dataflow service during execution planning.

2083

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2084

{ # Description of an interstitial value between transforms in an execution

2085

# stage.

2086

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2087

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2088

# source is most closely associated.

2089

"name": "A String", # Dataflow service generated name for this source.

2090

},

2091

],

2092

"kind": "A String", # Type of tranform this stage is executing.

2093

"name": "A String", # Dataflow service generated name for this stage.

2094

"outputSource": [ # Output sources for this stage.

2095

{ # Description of an input or output of an execution stage.

2096

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2097

"sizeBytes": "A String", # Size of the source, if measurable.

2098

"name": "A String", # Dataflow service generated name for this source.

2099

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2100

# source is most closely associated.

2101

},

2102

],

2103

"inputSource": [ # Input sources for this stage.

2104

{ # Description of an input or output of an execution stage.

2105

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2106

"sizeBytes": "A String", # Size of the source, if measurable.

2107

"name": "A String", # Dataflow service generated name for this source.

2108

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2109

# source is most closely associated.

2110

},

2111

],

2112

"componentTransform": [ # Transforms that comprise this execution stage.

2113

{ # Description of a transform executed as part of an execution stage.

2114

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2115

"originalTransform": "A String", # User name for the original user transform with which this transform is

2116

# most closely associated.

2117

"name": "A String", # Dataflow service generated name for this source.

2118

},

2119

],

2120

"id": "A String", # Dataflow service generated id for this stage.

2121

},

2122

],

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2123

"displayData": [ # Pipeline level display data.

2124

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2125

"key": "A String", # The key identifying the display data.

2126

# This is intended to be used as a label for the display data

2127

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2128

"shortStrValue": "A String", # A possible additional shorter value to display.

2129

# For example a java_class_name_value of com.mypackage.MyDoFn

2130

# will be stored with MyDoFn as the short_str_value and

2131

# com.mypackage.MyDoFn as the java_class_name value.

2132

# short_str_value can be displayed and java_class_name_value

2133

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2134

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2135

"url": "A String", # An optional full URL.

2136

"floatValue": 3.14, # Contains value if the data is of float type.

2137

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2138

# language namespace (i.e. python module) which defines the display data.

2139

# This allows a dax monitoring system to specially handle the data

2140

# and perform custom rendering.

2141

"javaClassValue": "A String", # Contains value if the data is of java class type.

2142

"label": "A String", # An optional label to display in a dax UI for the element.

2143

"boolValue": True or False, # Contains value if the data is of a boolean type.

2144

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2145

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2146

"int64Value": "A String", # Contains value if the data is of int64 type.

2147

},

2148

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2149

},

2150

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2151

# callers cannot mutate it.

2152

{ # A message describing the state of a particular execution stage.

2153

"executionStageName": "A String", # The name of the execution stage.

2154

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2155

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2156

},

2157

],

2158

"id": "A String", # The unique ID of this job.

2159

#

2160

# This field is set by the Cloud Dataflow service when the Job is

2161

# created, and is immutable for the life of the job.

2162

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2163

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2164

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2165

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

2166

# corresponding name prefixes of the new job.

2167

"a_key": "A String",

2168

},

2169

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2170

"workerRegion": "A String", # The Compute Engine region

2171

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2172

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2173

# with worker_zone. If neither worker_region nor worker_zone is specified,

2174

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2175

"version": { # A structure describing which components and their versions of the service

2176

# are required in order to run the job.

2177

"a_key": "", # Properties of the object.

2178

},

2179

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2180

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2181

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2182

#

2183

# Format:

2184

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2185

"internalExperiments": { # Experimental settings.

2186

"a_key": "", # Properties of the object. Contains field @type with type URL.

2187

},

2188

"dataset": "A String", # The dataset for the current project where various workflow

2189

# related tables are stored.

2190

#

2191

# The supported resource type is:

2192

#

2193

# Google BigQuery:

2194

# bigquery.googleapis.com/{dataset}

2195

"experiments": [ # The list of experiments to enable.

2196

"A String",

2197

],

2198

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2199

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2200

# options are passed through the service and are used to recreate the

2201

# SDK pipeline options on the worker in a language agnostic and platform

2202

# independent way.

2203

"a_key": "", # Properties of the object.

2204

},

2205

"userAgent": { # A description of the process that generated the request.

2206

"a_key": "", # Properties of the object.

2207

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2208

"workerZone": "A String", # The Compute Engine zone

2209

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2210

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2211

# with worker_region. If neither worker_region nor worker_zone is specified,

2212

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2213

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2214

# specified in order for the job to have workers.

2215

{ # Describes one particular pool of Cloud Dataflow workers to be

2216

# instantiated by the Cloud Dataflow service in order to perform the

2217

# computations required by a job. Note that a workflow job may use

2218

# multiple pools, in order to match the various computational

2219

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2220

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2221

# harness, residing in Google Container Registry.

2222

#

2223

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2224

"ipConfiguration": "A String", # Configuration for VM IPs.

2225

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2226

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2227

"algorithm": "A String", # The algorithm to use for autoscaling.

2228

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2229

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2230

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2231

# the service will use the network "default".

2232

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2233

# will attempt to choose a reasonable default.

2234

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2235

"a_key": "A String",

2236

},

2237

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2238

# service will attempt to choose a reasonable default.

2239

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2240

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2241

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2242

# using the standard Dataflow task runner. Users should ignore

2243

# this field.

2244

"workflowFileName": "A String", # The file to store the workflow in.

2245

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2246

# will not be uploaded.

2247

#

2248

# The supported resource type is:

2249

#

2250

# Google Cloud Storage:

2251

# storage.googleapis.com/{bucket}/{object}

2252

# bucket.storage.googleapis.com/{object}

2253

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2254

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2255

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2256

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2257

"vmId": "A String", # The ID string of the VM.

2258

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2259

# taskrunner; e.g. "wheel".

2260

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2261

# taskrunner; e.g. "root".

2262

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2263

# access the Cloud Dataflow API.

2264

"A String",

2265

],

2266

"languageHint": "A String", # The suggested backend language.

2267

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2268

# console.

2269

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2270

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2271

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2272

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2273

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2274

# "shuffle/v1beta1".

2275

"workerId": "A String", # The ID of the worker running this pipeline.

2276

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2277

#

2278

# When workers access Google Cloud APIs, they logically do so via

2279

# relative URLs. If this field is specified, it supplies the base

2280

# URL to use for resolving these relative URLs. The normative

2281

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2282

# Locators".

2283

#

2284

# If not specified, the default value is "http://www.googleapis.com/"

2285

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2286

# "dataflow/v1b3/projects".

2287

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2288

# storage.

2289

#

2290

# The supported resource type is:

2291

#

2292

# Google Cloud Storage:

2293

#

2294

# storage.googleapis.com/{bucket}/{object}

2295

# bucket.storage.googleapis.com/{object}

2296

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2297

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2298

"harnessCommand": "A String", # The command to launch the worker harness.

2299

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2300

# temporary storage.

2301

#

2302

# The supported resource type is:

2303

#

2304

# Google Cloud Storage:

2305

# storage.googleapis.com/{bucket}/{object}

2306

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2307

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2308

#

2309

# When workers access Google Cloud APIs, they logically do so via

2310

# relative URLs. If this field is specified, it supplies the base

2311

# URL to use for resolving these relative URLs. The normative

2312

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2313

# Locators".

2314

#

2315

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2316

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2317

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2318

# service will choose a number of threads (according to the number of cores

2319

# on the selected machine type for batch, or 1 by convention for streaming).

2320

"poolArgs": { # Extra arguments for this worker pool.

2321

"a_key": "", # Properties of the object. Contains field @type with type URL.

2322

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2323

"packages": [ # Packages to be installed on workers.

2324

{ # The packages that must be installed in order for a worker to run the

2325

# steps of the Cloud Dataflow job that will be assigned to its worker

2326

# pool.

2327

#

2328

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2329

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2330

# might use this to install jars containing the user's code and all of the

2331

# various dependencies (libraries, data files, etc.) required in order

2332

# for that code to run.

2333

"location": "A String", # The resource to read the package from. The supported resource type is:

2334

#

2335

# Google Cloud Storage:

2336

#

2337

# storage.googleapis.com/{bucket}

2338

# bucket.storage.googleapis.com/

2339

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2340

},

2341

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2342

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2343

# select a default set of packages which are useful to worker

2344

# harnesses written in a particular language.

2345

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2346

# are supported.

2347

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2348

# attempt to choose a reasonable default.

2349

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2350

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2351

# `TEARDOWN_NEVER`.

2352

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2353

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2354

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2355

# down.

2356

#

2357

# If the workers are not torn down by the service, they will

2358

# continue to run and use Google Compute Engine VM resources in the

2359

# user's project until they are explicitly terminated by the user.

2360

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2361

# policy except for small, manually supervised test jobs.

2362

#

2363

# If unknown or unspecified, the service will attempt to choose a reasonable

2364

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2365

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2366

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2367

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2368

# execute the job. If zero or unspecified, the service will

2369

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2370

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2371

# the form "regions/REGION/subnetworks/SUBNETWORK".

2372

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2373

{ # Describes the data disk used by a workflow job.

2374

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2375

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2376

# attempt to choose a reasonable default.

2377

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2378

# must be a disk type appropriate to the project and zone in which

2379

# the workers will run. If unknown or unspecified, the service

2380

# will attempt to choose a reasonable default.

2381

#

2382

# For example, the standard persistent disk type is a resource name

2383

# typically ending in "pd-standard". If SSD persistent disks are

2384

# available, the resource name typically ends with "pd-ssd". The

2385

# actual valid values are defined the Google Compute Engine API,

2386

# not by the Cloud Dataflow API; consult the Google Compute Engine

2387

# documentation for more information about determining the set of

2388

# available disk types for a particular project and zone.

2389

#

2390

# Google Compute Engine Disk types are local to a particular

2391

# project in a particular zone, and so the resource name will

2392

# typically look something like this:

2393

#

2394

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2395

},

2396

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2397

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2398

# only be set in the Fn API path. For non-cross-language pipelines this

2399

# should have only one entry. Cross-language pipelines will have two or more

2400

# entries.

2401

{ # Defines a SDK harness container for executing Dataflow pipelines.

2402

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2403

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2404

# container instance with this image. If false (or unset) recommends using

2405

# more than one core per SDK container instance with this image for

2406

# efficiency. Note that Dataflow service may choose to override this property

2407

# if needed.

2408

},

2409

],

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2410

},

2411

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2412

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

2413

# unspecified, the service will attempt to choose a reasonable

2414

# default. This should be in the form of the API service name,

2415

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2416

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2417

# storage. The system will append the suffix "/temp-{JOBNAME} to

2418

# this resource prefix, where {JOBNAME} is the value of the

2419

# job_name field. The resulting bucket and object prefix is used

2420

# as the prefix of the resources used to store temporary data

2421

# needed during the job execution. NOTE: This will override the

2422

# value in taskrunner_settings.

2423

# The supported resource type is:

2424

#

2425

# Google Cloud Storage:

2426

#

2427

# storage.googleapis.com/{bucket}/{object}

2428

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2429

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2430

"location": "A String", # The [regional endpoint]

2431

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2432

# contains this job.

2433

"tempFiles": [ # A set of files the system should be aware of that are used

2434

# for temporary storage. These temporary files will be

2435

# removed on job completion.

2436

# No duplicates are allowed.

2437

# No file patterns are supported.

2438

#

2439

# The supported files are:

2440

#

2441

# Google Cloud Storage:

2442

#

2443

# storage.googleapis.com/{bucket}/{object}

2444

# bucket.storage.googleapis.com/{object}

2445

"A String",

2446

],

2447

"type": "A String", # The type of Cloud Dataflow job.

2448

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2449

# If this field is set, the service will ensure its uniqueness.

2450

# The request to create a job will fail if the service has knowledge of a

2451

# previously submitted job with the same client's ID and job name.

2452

# The caller may use this field to ensure idempotence of job

2453

# creation across retried attempts to create a job.

2454

# By default, the field is empty and, in that case, the service ignores it.

2455

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2456

# snapshot.

2457

"stepsLocation": "A String", # The GCS location where the steps are stored.

2458

"currentStateTime": "A String", # The timestamp associated with the current state.

2459

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2460

# Flexible resource scheduling jobs are started with some delay after job

2461

# creation, so start_time is unset before start and is updated when the

2462

# job is started by the Cloud Dataflow service. For other jobs, start_time

2463

# always equals to create_time and is immutable and set by the Cloud Dataflow

2464

# service.

2465

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2466

# Cloud Dataflow service.

2467

"requestedState": "A String", # The job's requested state.

2468

#

2469

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2470

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2471

# also be used to directly set a job's requested state to

2472

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2473

# job if it has not already reached a terminal state.

2474

"name": "A String", # The user-specified Cloud Dataflow job name.

2475

#

2476

# Only one Job with a given name may exist in a project at any

2477

# given time. If a caller attempts to create a Job with the same

2478

# name as an already-existing Job, the attempt returns the

2479

# existing Job.

2480

#

2481

# The name must match the regular expression

2482

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

2483

"steps": [ # Exactly one of step or steps_location should be specified.

2484

#

2485

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2486

{ # Defines a particular step within a Cloud Dataflow job.

2487

#

2488

# A job consists of multiple steps, each of which performs some

2489

# specific operation as part of the overall job. Data is typically

2490

# passed from one step to another as part of the job.

2491

#

2492

# Here's an example of a sequence of steps which together implement a

2493

# Map-Reduce job:

2494

#

2495

# * Read a collection of data from some source, parsing the

2496

# collection's elements.

2497

#

2498

# * Validate the elements.

2499

#

2500

# * Apply a user-defined function to map each element to some value

2501

# and extract an element-specific key value.

2502

#

2503

# * Group elements with the same key into a single element with

2504

# that key, transforming a multiply-keyed collection into a

2505

# uniquely-keyed collection.

2506

#

2507

# * Write the elements out to some data sink.

2508

#

2509

# Note that the Cloud Dataflow service may be used to run many different

2510

# types of jobs, not just Map-Reduce.

2511

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2512

"name": "A String", # The name that identifies the step. This must be unique for each

2513

# step with respect to all other steps in the Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2514

"properties": { # Named properties associated with the step. Each kind of

2515

# predefined step has its own required set of properties.

2516

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2517

"a_key": "", # Properties of the object.

2518

},

2519

},

2520

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

2521

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2522

# of the job it replaced.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2523

#

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

2524

# When sending a `CreateJobRequest`, you can update a job by specifying it

2525

# here. The job named here is stopped, and its intermediate state is

2526

# transferred to this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2527

"currentState": "A String", # The current state of the job.

2528

#

2529

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2530

# specified.

2531

#

2532

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2533

# terminal state. After a job has reached a terminal state, no

2534

# further state updates may be made.

2535

#

2536

# This field may be mutated by the Cloud Dataflow service;

2537

# callers cannot mutate it.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2538

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2539

# isn't contained in the submitted job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2540

"stages": { # A mapping from each stage to the information about that stage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2541

"a_key": { # Contains information about how a particular

2542

# google.dataflow.v1beta3.Step will be executed.

2543

"stepName": [ # The steps associated with the execution stage.

2544

# Note that stages may have several steps, and that a given step

2545

# might be run by more than one stage.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2546

"A String",

2547

],

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2548

},

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2549

},

2550

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2551

}</pre>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

</div>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2555

<code class="details" id="getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</code>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2556

<pre>Request the job status.

2557

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2558

To request the status of a job, we recommend using

2559

`projects.locations.jobs.getMetrics` with a [regional endpoint]

2560

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2561

`projects.jobs.getMetrics` is not recommended, as you can only request the

2562

status of jobs that are running in `us-central1`.

2563

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2564

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2565

projectId: string, A project id. (required)

2566

jobId: string, The job to get messages for. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2567

startTime: string, Return only metric data that has changed since this time.

2568

Default is to return all information about all metrics for the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2569

location: string, The [regional endpoint]

2570

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2571

contains the job specified by job_id.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2572

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2573

Allowed values

2574

1 - v1 error format

2575

2 - v2 error format

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2576

2577

Returns:

2578

An object of the form:

2579

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2580

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2581

# of a Dataflow job. Metrics correspond to user-defined and system-defined

2582

# metrics in the job.

2583

#

2584

# This resource captures only the most recent values of each metric;

2585

# time-series data can be queried for them (under the same metric names)

2586

# from Cloud Monitoring.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2587

"metrics": [ # All metrics for this job.

2588

{ # Describes the state of a metric.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2589

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

2590

# This holds the count of the aggregated values and is used in combination

2591

# with mean_sum above to obtain the actual mean aggregate value.

2592

# The only possible value type is Long.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2593

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

2594

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

2595

# The specified aggregation kind is case-insensitive.

2596

#

2597

# If omitted, this is not an aggregated value but instead

2598

# a single metric sample value.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2599

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

2600

# possible value type is a list of Values whose type can be Long, Double,

2601

# or String, according to the metric's type. All Values in the list must

2602

# be of the same type.

2603

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

2604

# metric.

2605

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

2606

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2607

"name": "A String", # Worker-defined metric name.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2608

"context": { # Zero or more labeled fields which identify the part of the job this

2609

# metric is associated with, such as the name of a step or collection.

2610

#

2611

# For example, built-in counters associated with steps will have

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2612

# context['step'] = <step-name>. Counters associated with PCollections

2613

# in the SDK will have context['pcollection'] = <pcollection-name>.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2614

"a_key": "A String",

2615

},

2616

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2617

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

2618

# This holds the sum of the aggregated values and is used in combination

2619

# with mean_count below to obtain the actual mean aggregate value.

2620

# The only possible value types are Long and Double.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2621

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

2622

# value accumulated since the worker started working on this WorkItem.

2623

# By default this is false, indicating that this metric is reported

2624

# as a delta that is not associated with any WorkItem.

2625

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

2626

# reporting work progress; it will be filled in responses from the

2627

# metrics API.

2628

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

2629

# "And", and "Or". The possible value types are Long, Double, and Boolean.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2630

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

2631

# service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2632

"gauge": "", # A struct value describing properties of a Gauge.

2633

# Metrics of gauge type show the value of a metric across time, and is

2634

# aggregated based on the newest value.

2635

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2636

},

2637

],

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2638

"metricTime": "A String", # Timestamp as of which metric values are current.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2643

<code class="details" id="list">list(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2644

<pre>List the jobs of a project.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2645

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2646

To list the jobs of a project in a region, we recommend using

2647

`projects.locations.jobs.get` with a [regional endpoint]

2648

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2649

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2650

`projects.jobs.list` is not recommended, as you can only get the list of

2651

jobs that are running in `us-central1`.

2652

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2653

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2654

projectId: string, The project which owns the jobs. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2655

pageSize: integer, If there are many jobs, limit response to at most this many.

2656

The actual number of jobs returned will be the lesser of max_responses

2657

and an unspecified server-defined limit.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2658

pageToken: string, Set this to the 'next_page_token' field of a previous response

2659

to request additional results in a long list.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2660

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2661

Allowed values

2662

1 - v1 error format

2663

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2664

location: string, The [regional endpoint]

2665

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2666

contains this job.

Jon Wayne Parrott

692617a

2017-01-06 09:58:29 -0800

[diff] [blame]

2667

filter: string, The kind of filter to use.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2668

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2669

2670

Returns:

2671

An object of the form:

2672

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2673

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2674

# be a partial response, depending on the page size in the ListJobsRequest.

2675

# However, if the project does not have any jobs, an instance of

2676

# ListJobsResponse is not returned and the requests's response

2677

# body is empty {}.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2678

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2679

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

2680

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2681

# failed to respond.

2682

{ # Indicates which [regional endpoint]

2683

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

2684

# to respond to a request for data.

2685

"name": "A String", # The name of the [regional endpoint]

2686

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2687

# failed to respond.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2688

},

2689

],

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2690

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2691

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2692

"labels": { # User-defined labels for this job.

2693

#

2694

# The labels map can contain no more than 64 entries. Entries of the labels

2695

# map are UTF8 strings that comply with the following restrictions:

2696

#

2697

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2698

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2699

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2704

# by the metadata values provided here. Populated for ListJobs and all GetJob

2705

# views SUMMARY and higher.

2706

# ListJob response and Job SUMMARY view.

2707

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

2708

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

2709

"version": "A String", # The version of the SDK used to run the job.

2710

"sdkSupportStatus": "A String", # The support status for this SDK version.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

2711

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2712

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2713

{ # Metadata for a PubSub connector used by the job.

2714

"topic": "A String", # Topic accessed in the connection.

2715

"subscription": "A String", # Subscription used in the connection.

2716

},

2717

],

2718

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2719

{ # Metadata for a Datastore connector used by the job.

2720

"projectId": "A String", # ProjectId accessed in the connection.

2721

"namespace": "A String", # Namespace used in the connection.

2722

},

2723

],

2724

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2725

{ # Metadata for a File connector used by the job.

2726

"filePattern": "A String", # File Pattern used to access files by the connector.

2727

},

2728

],

2729

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2730

{ # Metadata for a Spanner connector used by the job.

2731

"instanceId": "A String", # InstanceId accessed in the connection.

2732

"projectId": "A String", # ProjectId accessed in the connection.

2733

"databaseId": "A String", # DatabaseId accessed in the connection.

2734

},

2735

],

2736

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2737

{ # Metadata for a BigTable connector used by the job.

2738

"instanceId": "A String", # InstanceId accessed in the connection.

2739

"projectId": "A String", # ProjectId accessed in the connection.

2740

"tableId": "A String", # TableId accessed in the connection.

2741

},

2742

],

2743

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2744

{ # Metadata for a BigQuery connector used by the job.

2745

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2746

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2747

"table": "A String", # Table accessed in the connection.

2748

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2753

# A description of the user pipeline and stages through which it is executed.

2754

# Created by Cloud Dataflow service. Only retrieved with

2755

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2756

# form. This data is provided by the Dataflow service for ease of visualizing

2757

# the pipeline and interpreting Dataflow provided metrics.

2758

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2759

{ # Description of the type, names/ids, and input/outputs for a transform.

2760

"kind": "A String", # Type of transform.

2761

"name": "A String", # User provided name for this transform instance.

2762

"inputCollectionName": [ # User names for all collection inputs to this transform.

2763

"A String",

2764

],

2765

"displayData": [ # Transform-specific display data.

2766

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2767

"key": "A String", # The key identifying the display data.

2768

# This is intended to be used as a label for the display data

2769

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2770

"shortStrValue": "A String", # A possible additional shorter value to display.

2771

# For example a java_class_name_value of com.mypackage.MyDoFn

2772

# will be stored with MyDoFn as the short_str_value and

2773

# com.mypackage.MyDoFn as the java_class_name value.

2774

# short_str_value can be displayed and java_class_name_value

2775

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2776

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2777

"url": "A String", # An optional full URL.

2778

"floatValue": 3.14, # Contains value if the data is of float type.

2779

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2780

# language namespace (i.e. python module) which defines the display data.

2781

# This allows a dax monitoring system to specially handle the data

2782

# and perform custom rendering.

2783

"javaClassValue": "A String", # Contains value if the data is of java class type.

2784

"label": "A String", # An optional label to display in a dax UI for the element.

2785

"boolValue": True or False, # Contains value if the data is of a boolean type.

2786

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2787

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2788

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2789

},

2790

],

2791

"outputCollectionName": [ # User names for all collection outputs to this transform.

2792

"A String",

2793

],

2794

"id": "A String", # SDK generated id of this transform instance.

2795

},

2796

],

2797

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2798

{ # Description of the composing transforms, names/ids, and input/outputs of a

2799

# stage of execution. Some composing transforms and sources may have been

2800

# generated by the Dataflow service during execution planning.

2801

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2802

{ # Description of an interstitial value between transforms in an execution

2803

# stage.

2804

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2805

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2806

# source is most closely associated.

2807

"name": "A String", # Dataflow service generated name for this source.

2808

},

2809

],

2810

"kind": "A String", # Type of tranform this stage is executing.

2811

"name": "A String", # Dataflow service generated name for this stage.

2812

"outputSource": [ # Output sources for this stage.

2813

{ # Description of an input or output of an execution stage.

2814

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2815

"sizeBytes": "A String", # Size of the source, if measurable.

2816

"name": "A String", # Dataflow service generated name for this source.

2817

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2818

# source is most closely associated.

2819

},

2820

],

2821

"inputSource": [ # Input sources for this stage.

2822

{ # Description of an input or output of an execution stage.

2823

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2824

"sizeBytes": "A String", # Size of the source, if measurable.

2825

"name": "A String", # Dataflow service generated name for this source.

2826

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2827

# source is most closely associated.

2828

},

2829

],

2830

"componentTransform": [ # Transforms that comprise this execution stage.

2831

{ # Description of a transform executed as part of an execution stage.

2832

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2833

"originalTransform": "A String", # User name for the original user transform with which this transform is

2834

# most closely associated.

2835

"name": "A String", # Dataflow service generated name for this source.

2836

},

2837

],

2838

"id": "A String", # Dataflow service generated id for this stage.

2839

},

2840

],

2841

"displayData": [ # Pipeline level display data.

2842

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2843

"key": "A String", # The key identifying the display data.

2844

# This is intended to be used as a label for the display data

2845

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2846

"shortStrValue": "A String", # A possible additional shorter value to display.

2847

# For example a java_class_name_value of com.mypackage.MyDoFn

2848

# will be stored with MyDoFn as the short_str_value and

2849

# com.mypackage.MyDoFn as the java_class_name value.

2850

# short_str_value can be displayed and java_class_name_value

2851

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2852

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2853

"url": "A String", # An optional full URL.

2854

"floatValue": 3.14, # Contains value if the data is of float type.

2855

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2856

# language namespace (i.e. python module) which defines the display data.

2857

# This allows a dax monitoring system to specially handle the data

2858

# and perform custom rendering.

2859

"javaClassValue": "A String", # Contains value if the data is of java class type.

2860

"label": "A String", # An optional label to display in a dax UI for the element.

2861

"boolValue": True or False, # Contains value if the data is of a boolean type.

2862

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2863

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2864

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2869

# callers cannot mutate it.

2870

{ # A message describing the state of a particular execution stage.

2871

"executionStageName": "A String", # The name of the execution stage.

2872

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2873

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2874

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2875

],

2876

"id": "A String", # The unique ID of this job.

2877

#

2878

# This field is set by the Cloud Dataflow service when the Job is

2879

# created, and is immutable for the life of the job.

2880

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2881

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2882

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2883

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

2884

# corresponding name prefixes of the new job.

2885

"a_key": "A String",

2886

},

2887

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2888

"workerRegion": "A String", # The Compute Engine region

2889

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2890

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2891

# with worker_zone. If neither worker_region nor worker_zone is specified,

2892

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2893

"version": { # A structure describing which components and their versions of the service

2894

# are required in order to run the job.

2895

"a_key": "", # Properties of the object.

2896

},

2897

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2898

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2899

# at rest, AKA a Customer Managed Encryption Key (CMEK).

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2900

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2901

# Format:

2902

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2903

"internalExperiments": { # Experimental settings.

2904

"a_key": "", # Properties of the object. Contains field @type with type URL.

2905

},

2906

"dataset": "A String", # The dataset for the current project where various workflow

2907

# related tables are stored.

2908

#

2909

# The supported resource type is:

2910

#

2911

# Google BigQuery:

2912

# bigquery.googleapis.com/{dataset}

2913

"experiments": [ # The list of experiments to enable.

2914

"A String",

2915

],

2916

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2917

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2918

# options are passed through the service and are used to recreate the

2919

# SDK pipeline options on the worker in a language agnostic and platform

2920

# independent way.

2921

"a_key": "", # Properties of the object.

2922

},

2923

"userAgent": { # A description of the process that generated the request.

2924

"a_key": "", # Properties of the object.

2925

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2926

"workerZone": "A String", # The Compute Engine zone

2927

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2928

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2929

# with worker_region. If neither worker_region nor worker_zone is specified,

2930

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2931

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2932

# specified in order for the job to have workers.

2933

{ # Describes one particular pool of Cloud Dataflow workers to be

2934

# instantiated by the Cloud Dataflow service in order to perform the

2935

# computations required by a job. Note that a workflow job may use

2936

# multiple pools, in order to match the various computational

2937

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2938

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2939

# harness, residing in Google Container Registry.

2940

#

2941

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2942

"ipConfiguration": "A String", # Configuration for VM IPs.

2943

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2944

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2945

"algorithm": "A String", # The algorithm to use for autoscaling.

2946

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2947

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2948

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2949

# the service will use the network "default".

2950

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2951

# will attempt to choose a reasonable default.

2952

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2953

"a_key": "A String",

2954

},

2955

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2956

# service will attempt to choose a reasonable default.

2957

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2958

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2959

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2960

# using the standard Dataflow task runner. Users should ignore

2961

# this field.

2962

"workflowFileName": "A String", # The file to store the workflow in.

2963

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2964

# will not be uploaded.

2965

#

2966

# The supported resource type is:

2967

#

2968

# Google Cloud Storage:

2969

# storage.googleapis.com/{bucket}/{object}

2970

# bucket.storage.googleapis.com/{object}

2971

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

2972

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2973

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2974

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2975

"vmId": "A String", # The ID string of the VM.

2976

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2977

# taskrunner; e.g. "wheel".

2978

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2979

# taskrunner; e.g. "root".

2980

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2981

# access the Cloud Dataflow API.

2982

"A String",

2983

],

2984

"languageHint": "A String", # The suggested backend language.

2985

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2986

# console.

2987

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2988

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2989

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2990

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2991

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2992

# "shuffle/v1beta1".

2993

"workerId": "A String", # The ID of the worker running this pipeline.

2994

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2995

#

2996

# When workers access Google Cloud APIs, they logically do so via

2997

# relative URLs. If this field is specified, it supplies the base

2998

# URL to use for resolving these relative URLs. The normative

2999

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3000

# Locators".

3001

#

3002

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3003

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3004

# "dataflow/v1b3/projects".

3005

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3006

# storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3007

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

3008

# The supported resource type is:

3009

#

3010

# Google Cloud Storage:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3011

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

3012

# storage.googleapis.com/{bucket}/{object}

3013

# bucket.storage.googleapis.com/{object}

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

3014

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3015

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3016

"harnessCommand": "A String", # The command to launch the worker harness.

3017

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3018

# temporary storage.

3019

#

3020

# The supported resource type is:

3021

#

3022

# Google Cloud Storage:

3023

# storage.googleapis.com/{bucket}/{object}

3024

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3025

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3026

#

3027

# When workers access Google Cloud APIs, they logically do so via

3028

# relative URLs. If this field is specified, it supplies the base

3029

# URL to use for resolving these relative URLs. The normative

3030

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3031

# Locators".

3032

#

3033

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3034

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3035

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3036

# service will choose a number of threads (according to the number of cores

3037

# on the selected machine type for batch, or 1 by convention for streaming).

3038

"poolArgs": { # Extra arguments for this worker pool.

3039

"a_key": "", # Properties of the object. Contains field @type with type URL.

3040

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3041

"packages": [ # Packages to be installed on workers.

3042

{ # The packages that must be installed in order for a worker to run the

3043

# steps of the Cloud Dataflow job that will be assigned to its worker

3044

# pool.

3045

#

3046

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3047

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3048

# might use this to install jars containing the user's code and all of the

3049

# various dependencies (libraries, data files, etc.) required in order

3050

# for that code to run.

3051

"location": "A String", # The resource to read the package from. The supported resource type is:

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3052

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3053

# Google Cloud Storage:

3054

#

3055

# storage.googleapis.com/{bucket}

3056

# bucket.storage.googleapis.com/

3057

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3058

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3059

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3060

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3061

# select a default set of packages which are useful to worker

3062

# harnesses written in a particular language.

3063

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3064

# are supported.

3065

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3066

# attempt to choose a reasonable default.

3067

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3068

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3069

# `TEARDOWN_NEVER`.

3070

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3071

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3072

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3073

# down.

3074

#

3075

# If the workers are not torn down by the service, they will

3076

# continue to run and use Google Compute Engine VM resources in the

3077

# user's project until they are explicitly terminated by the user.

3078

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3079

# policy except for small, manually supervised test jobs.

3080

#

3081

# If unknown or unspecified, the service will attempt to choose a reasonable

3082

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3083

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3084

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3085

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3086

# execute the job. If zero or unspecified, the service will

3087

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3088

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3089

# the form "regions/REGION/subnetworks/SUBNETWORK".

3090

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3091

{ # Describes the data disk used by a workflow job.

3092

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3093

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3094

# attempt to choose a reasonable default.

3095

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3096

# must be a disk type appropriate to the project and zone in which

3097

# the workers will run. If unknown or unspecified, the service

3098

# will attempt to choose a reasonable default.

3099

#

3100

# For example, the standard persistent disk type is a resource name

3101

# typically ending in "pd-standard". If SSD persistent disks are

3102

# available, the resource name typically ends with "pd-ssd". The

3103

# actual valid values are defined the Google Compute Engine API,

3104

# not by the Cloud Dataflow API; consult the Google Compute Engine

3105

# documentation for more information about determining the set of

3106

# available disk types for a particular project and zone.

3107

#

3108

# Google Compute Engine Disk types are local to a particular

3109

# project in a particular zone, and so the resource name will

3110

# typically look something like this:

3111

#

3112

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3113

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3114

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3115

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3116

# only be set in the Fn API path. For non-cross-language pipelines this

3117

# should have only one entry. Cross-language pipelines will have two or more

3118

# entries.

3119

{ # Defines a SDK harness container for executing Dataflow pipelines.

3120

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3121

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3122

# container instance with this image. If false (or unset) recommends using

3123

# more than one core per SDK container instance with this image for

3124

# efficiency. Note that Dataflow service may choose to override this property

3125

# if needed.

3126

},

3127

],

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

3128

},

3129

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3130

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3131

# unspecified, the service will attempt to choose a reasonable

3132

# default. This should be in the form of the API service name,

3133

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3134

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3135

# storage. The system will append the suffix "/temp-{JOBNAME} to

3136

# this resource prefix, where {JOBNAME} is the value of the

3137

# job_name field. The resulting bucket and object prefix is used

3138

# as the prefix of the resources used to store temporary data

3139

# needed during the job execution. NOTE: This will override the

3140

# value in taskrunner_settings.

3141

# The supported resource type is:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3142

#

3143

# Google Cloud Storage:

3144

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3145

# storage.googleapis.com/{bucket}/{object}

3146

# bucket.storage.googleapis.com/{object}

3147

},

3148

"location": "A String", # The [regional endpoint]

3149

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3150

# contains this job.

3151

"tempFiles": [ # A set of files the system should be aware of that are used

3152

# for temporary storage. These temporary files will be

3153

# removed on job completion.

3154

# No duplicates are allowed.

3155

# No file patterns are supported.

3156

#

3157

# The supported files are:

3158

#

3159

# Google Cloud Storage:

3160

#

3161

# storage.googleapis.com/{bucket}/{object}

3162

# bucket.storage.googleapis.com/{object}

3163

"A String",

3164

],

3165

"type": "A String", # The type of Cloud Dataflow job.

3166

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3167

# If this field is set, the service will ensure its uniqueness.

3168

# The request to create a job will fail if the service has knowledge of a

3169

# previously submitted job with the same client's ID and job name.

3170

# The caller may use this field to ensure idempotence of job

3171

# creation across retried attempts to create a job.

3172

# By default, the field is empty and, in that case, the service ignores it.

3173

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3174

# snapshot.

3175

"stepsLocation": "A String", # The GCS location where the steps are stored.

3176

"currentStateTime": "A String", # The timestamp associated with the current state.

3177

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3178

# Flexible resource scheduling jobs are started with some delay after job

3179

# creation, so start_time is unset before start and is updated when the

3180

# job is started by the Cloud Dataflow service. For other jobs, start_time

3181

# always equals to create_time and is immutable and set by the Cloud Dataflow

3182

# service.

3183

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3184

# Cloud Dataflow service.

3185

"requestedState": "A String", # The job's requested state.

3186

#

3187

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3188

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3189

# also be used to directly set a job's requested state to

3190

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3191

# job if it has not already reached a terminal state.

3192

"name": "A String", # The user-specified Cloud Dataflow job name.

3193

#

3194

# Only one Job with a given name may exist in a project at any

3195

# given time. If a caller attempts to create a Job with the same

3196

# name as an already-existing Job, the attempt returns the

3197

# existing Job.

3198

#

3199

# The name must match the regular expression

3200

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3201

"steps": [ # Exactly one of step or steps_location should be specified.

3202

#

3203

# The top-level steps that constitute the entire job.

3204

{ # Defines a particular step within a Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3205

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3206

# A job consists of multiple steps, each of which performs some

3207

# specific operation as part of the overall job. Data is typically

3208

# passed from one step to another as part of the job.

3209

#

3210

# Here's an example of a sequence of steps which together implement a

3211

# Map-Reduce job:

3212

#

3213

# * Read a collection of data from some source, parsing the

3214

# collection's elements.

3215

#

3216

# * Validate the elements.

3217

#

3218

# * Apply a user-defined function to map each element to some value

3219

# and extract an element-specific key value.

3220

#

3221

# * Group elements with the same key into a single element with

3222

# that key, transforming a multiply-keyed collection into a

3223

# uniquely-keyed collection.

3224

#

3225

# * Write the elements out to some data sink.

3226

#

3227

# Note that the Cloud Dataflow service may be used to run many different

3228

# types of jobs, not just Map-Reduce.

3229

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3230

"name": "A String", # The name that identifies the step. This must be unique for each

3231

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3232

"properties": { # Named properties associated with the step. Each kind of

3233

# predefined step has its own required set of properties.

3234

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

3235

"a_key": "", # Properties of the object.

3236

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3237

},

3238

],

3239

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3240

# of the job it replaced.

3241

#

3242

# When sending a `CreateJobRequest`, you can update a job by specifying it

3243

# here. The job named here is stopped, and its intermediate state is

3244

# transferred to this job.

3245

"currentState": "A String", # The current state of the job.

3246

#

3247

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3248

# specified.

3249

#

3250

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3251

# terminal state. After a job has reached a terminal state, no

3252

# further state updates may be made.

3253

#

3254

# This field may be mutated by the Cloud Dataflow service;

3255

# callers cannot mutate it.

3256

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3257

# isn't contained in the submitted job.

3258

"stages": { # A mapping from each stage to the information about that stage.

3259

"a_key": { # Contains information about how a particular

3260

# google.dataflow.v1beta3.Step will be executed.

3261

"stepName": [ # The steps associated with the execution stage.

3262

# Note that stages may have several steps, and that a given step

3263

# might be run by more than one stage.

3264

"A String",

3265

],

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3266

},

3267

},

3268

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3269

},

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

],

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

3276

<pre>Retrieves the next page of results.

3277

3278

Args:

3279

previous_request: The request for the previous page. (required)

3280

previous_response: The response from the request for the previous page. (required)

3281

3282

Returns:

3283

A request object that you can call 'execute()' on to request the next

3284

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3289

<code class="details" id="snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3290

<pre>Snapshot the state of a streaming job.

3291

3292

Args:

3293

projectId: string, The project which owns the job to be snapshotted. (required)

3294

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3295

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3296

The object takes the form of:

3297

3298

{ # Request to create a snapshot of a job.

3299

"location": "A String", # The location that contains this job.

3300

"ttl": "A String", # TTL for the snapshot.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3301

"description": "A String", # User specified description of the snapshot. Maybe empty.

3302

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3303

}

3304

3305

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3312

3313

{ # Represents a snapshot of a job.

3314

"sourceJobId": "A String", # The job this snapshot was created from.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3315

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

3316

# state.

3317

"description": "A String", # User specified description of the snapshot. Maybe empty.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3318

"projectId": "A String", # The project this snapshot belongs to.

3319

"creationTime": "A String", # The time this snapshot was created.

3320

"state": "A String", # State of the snapshot.

3321

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3322

"pubsubMetadata": [ # PubSub snapshot metadata.

3323

{ # Represents a Pubsub snapshot.

3324

"expireTime": "A String", # The expire time of the Pubsub snapshot.

3325

"snapshotName": "A String", # The name of the Pubsub snapshot.

3326

"topicName": "A String", # The name of the Pubsub topic.

3327

},

3328

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3329

"id": "A String", # The unique ID of this snapshot.

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3334

<code class="details" id="update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3335

<pre>Updates the state of an existing Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3336

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3337

To update the state of an existing job, we recommend using

3338

`projects.locations.jobs.update` with a [regional endpoint]

3339

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

3340

`projects.jobs.update` is not recommended, as you can only update the state

3341

of jobs that are running in `us-central1`.

3342

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3343

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3344

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

3345

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3346

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3347

The object takes the form of:

3348

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3349

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3350

"labels": { # User-defined labels for this job.

3351

#

3352

# The labels map can contain no more than 64 entries. Entries of the labels

3353

# map are UTF8 strings that comply with the following restrictions:

3354

#

3355

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3356

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3357

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3362

# by the metadata values provided here. Populated for ListJobs and all GetJob

3363

# views SUMMARY and higher.

3364

# ListJob response and Job SUMMARY view.

3365

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3366

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3367

"version": "A String", # The version of the SDK used to run the job.

3368

"sdkSupportStatus": "A String", # The support status for this SDK version.

3369

},

3370

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3371

{ # Metadata for a PubSub connector used by the job.

3372

"topic": "A String", # Topic accessed in the connection.

3373

"subscription": "A String", # Subscription used in the connection.

3374

},

3375

],

3376

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3377

{ # Metadata for a Datastore connector used by the job.

3378

"projectId": "A String", # ProjectId accessed in the connection.

3379

"namespace": "A String", # Namespace used in the connection.

3380

},

3381

],

3382

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3383

{ # Metadata for a File connector used by the job.

3384

"filePattern": "A String", # File Pattern used to access files by the connector.

3385

},

3386

],

3387

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3388

{ # Metadata for a Spanner connector used by the job.

3389

"instanceId": "A String", # InstanceId accessed in the connection.

3390

"projectId": "A String", # ProjectId accessed in the connection.

3391

"databaseId": "A String", # DatabaseId accessed in the connection.

3392

},

3393

],

3394

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3395

{ # Metadata for a BigTable connector used by the job.

3396

"instanceId": "A String", # InstanceId accessed in the connection.

3397

"projectId": "A String", # ProjectId accessed in the connection.

3398

"tableId": "A String", # TableId accessed in the connection.

3399

},

3400

],

3401

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3402

{ # Metadata for a BigQuery connector used by the job.

3403

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3404

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3405

"table": "A String", # Table accessed in the connection.

3406

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3411

# A description of the user pipeline and stages through which it is executed.

3412

# Created by Cloud Dataflow service. Only retrieved with

3413

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3414

# form. This data is provided by the Dataflow service for ease of visualizing

3415

# the pipeline and interpreting Dataflow provided metrics.

3416

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3417

{ # Description of the type, names/ids, and input/outputs for a transform.

3418

"kind": "A String", # Type of transform.

3419

"name": "A String", # User provided name for this transform instance.

3420

"inputCollectionName": [ # User names for all collection inputs to this transform.

3421

"A String",

3422

],

3423

"displayData": [ # Transform-specific display data.

3424

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3425

"key": "A String", # The key identifying the display data.

3426

# This is intended to be used as a label for the display data

3427

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3428

"shortStrValue": "A String", # A possible additional shorter value to display.

3429

# For example a java_class_name_value of com.mypackage.MyDoFn

3430

# will be stored with MyDoFn as the short_str_value and

3431

# com.mypackage.MyDoFn as the java_class_name value.

3432

# short_str_value can be displayed and java_class_name_value

3433

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3434

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3435

"url": "A String", # An optional full URL.

3436

"floatValue": 3.14, # Contains value if the data is of float type.

3437

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3438

# language namespace (i.e. python module) which defines the display data.

3439

# This allows a dax monitoring system to specially handle the data

3440

# and perform custom rendering.

3441

"javaClassValue": "A String", # Contains value if the data is of java class type.

3442

"label": "A String", # An optional label to display in a dax UI for the element.

3443

"boolValue": True or False, # Contains value if the data is of a boolean type.

3444

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3445

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3446

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3447

},

3448

],

3449

"outputCollectionName": [ # User names for all collection outputs to this transform.

3450

"A String",

3451

],

3452

"id": "A String", # SDK generated id of this transform instance.

3453

},

3454

],

3455

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3456

{ # Description of the composing transforms, names/ids, and input/outputs of a

3457

# stage of execution. Some composing transforms and sources may have been

3458

# generated by the Dataflow service during execution planning.

3459

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3460

{ # Description of an interstitial value between transforms in an execution

3461

# stage.

3462

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3463

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3464

# source is most closely associated.

3465

"name": "A String", # Dataflow service generated name for this source.

3466

},

3467

],

3468

"kind": "A String", # Type of tranform this stage is executing.

3469

"name": "A String", # Dataflow service generated name for this stage.

3470

"outputSource": [ # Output sources for this stage.

3471

{ # Description of an input or output of an execution stage.

3472

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3473

"sizeBytes": "A String", # Size of the source, if measurable.

3474

"name": "A String", # Dataflow service generated name for this source.

3475

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3476

# source is most closely associated.

3477

},

3478

],

3479

"inputSource": [ # Input sources for this stage.

3480

{ # Description of an input or output of an execution stage.

3481

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3482

"sizeBytes": "A String", # Size of the source, if measurable.

3483

"name": "A String", # Dataflow service generated name for this source.

3484

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3485

# source is most closely associated.

3486

},

3487

],

3488

"componentTransform": [ # Transforms that comprise this execution stage.

3489

{ # Description of a transform executed as part of an execution stage.

3490

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3491

"originalTransform": "A String", # User name for the original user transform with which this transform is

3492

# most closely associated.

3493

"name": "A String", # Dataflow service generated name for this source.

3494

},

3495

],

3496

"id": "A String", # Dataflow service generated id for this stage.

3497

},

3498

],

3499

"displayData": [ # Pipeline level display data.

3500

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3501

"key": "A String", # The key identifying the display data.

3502

# This is intended to be used as a label for the display data

3503

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3504

"shortStrValue": "A String", # A possible additional shorter value to display.

3505

# For example a java_class_name_value of com.mypackage.MyDoFn

3506

# will be stored with MyDoFn as the short_str_value and

3507

# com.mypackage.MyDoFn as the java_class_name value.

3508

# short_str_value can be displayed and java_class_name_value

3509

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3510

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3511

"url": "A String", # An optional full URL.

3512

"floatValue": 3.14, # Contains value if the data is of float type.

3513

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3514

# language namespace (i.e. python module) which defines the display data.

3515

# This allows a dax monitoring system to specially handle the data

3516

# and perform custom rendering.

3517

"javaClassValue": "A String", # Contains value if the data is of java class type.

3518

"label": "A String", # An optional label to display in a dax UI for the element.

3519

"boolValue": True or False, # Contains value if the data is of a boolean type.

3520

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3521

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3522

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3527

# callers cannot mutate it.

3528

{ # A message describing the state of a particular execution stage.

3529

"executionStageName": "A String", # The name of the execution stage.

3530

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3531

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3532

},

3533

],

3534

"id": "A String", # The unique ID of this job.

3535

#

3536

# This field is set by the Cloud Dataflow service when the Job is

3537

# created, and is immutable for the life of the job.

3538

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3539

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3540

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

3541

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

3542

# corresponding name prefixes of the new job.

3543

"a_key": "A String",

3544

},

3545

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3546

"workerRegion": "A String", # The Compute Engine region

3547

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3548

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3549

# with worker_zone. If neither worker_region nor worker_zone is specified,

3550

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3551

"version": { # A structure describing which components and their versions of the service

3552

# are required in order to run the job.

3553

"a_key": "", # Properties of the object.

3554

},

3555

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3556

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3557

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3558

#

3559

# Format:

3560

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

3561

"internalExperiments": { # Experimental settings.

3562

"a_key": "", # Properties of the object. Contains field @type with type URL.

3563

},

3564

"dataset": "A String", # The dataset for the current project where various workflow

3565

# related tables are stored.

3566

#

3567

# The supported resource type is:

3568

#

3569

# Google BigQuery:

3570

# bigquery.googleapis.com/{dataset}

3571

"experiments": [ # The list of experiments to enable.

3572

"A String",

3573

],

3574

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

3575

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3576

# options are passed through the service and are used to recreate the

3577

# SDK pipeline options on the worker in a language agnostic and platform

3578

# independent way.

3579

"a_key": "", # Properties of the object.

3580

},

3581

"userAgent": { # A description of the process that generated the request.

3582

"a_key": "", # Properties of the object.

3583

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3584

"workerZone": "A String", # The Compute Engine zone

3585

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3586

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3587

# with worker_region. If neither worker_region nor worker_zone is specified,

3588

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3589

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

3590

# specified in order for the job to have workers.

3591

{ # Describes one particular pool of Cloud Dataflow workers to be

3592

# instantiated by the Cloud Dataflow service in order to perform the

3593

# computations required by a job. Note that a workflow job may use

3594

# multiple pools, in order to match the various computational

3595

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3596

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3597

# harness, residing in Google Container Registry.

3598

#

3599

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

3600

"ipConfiguration": "A String", # Configuration for VM IPs.

3601

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3602

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3603

"algorithm": "A String", # The algorithm to use for autoscaling.

3604

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3605

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3606

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3607

# the service will use the network "default".

3608

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

3609

# will attempt to choose a reasonable default.

3610

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3611

"a_key": "A String",

3612

},

3613

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3614

# service will attempt to choose a reasonable default.

3615

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3616

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3617

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3618

# using the standard Dataflow task runner. Users should ignore

3619

# this field.

3620

"workflowFileName": "A String", # The file to store the workflow in.

3621

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3622

# will not be uploaded.

3623

#

3624

# The supported resource type is:

3625

#

3626

# Google Cloud Storage:

3627

# storage.googleapis.com/{bucket}/{object}

3628

# bucket.storage.googleapis.com/{object}

3629

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3630

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

3631

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3632

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3633

"vmId": "A String", # The ID string of the VM.

3634

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3635

# taskrunner; e.g. "wheel".

3636

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3637

# taskrunner; e.g. "root".

3638

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3639

# access the Cloud Dataflow API.

3640

"A String",

3641

],

3642

"languageHint": "A String", # The suggested backend language.

3643

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3644

# console.

3645

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3646

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3647

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3648

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3649

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3650

# "shuffle/v1beta1".

3651

"workerId": "A String", # The ID of the worker running this pipeline.

3652

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3653

#

3654

# When workers access Google Cloud APIs, they logically do so via

3655

# relative URLs. If this field is specified, it supplies the base

3656

# URL to use for resolving these relative URLs. The normative

3657

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3658

# Locators".

3659

#

3660

# If not specified, the default value is "http://www.googleapis.com/"

3661

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3662

# "dataflow/v1b3/projects".

3663

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3664

# storage.

3665

#

3666

# The supported resource type is:

3667

#

3668

# Google Cloud Storage:

3669

#

3670

# storage.googleapis.com/{bucket}/{object}

3671

# bucket.storage.googleapis.com/{object}

3672

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3673

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3674

"harnessCommand": "A String", # The command to launch the worker harness.

3675

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3676

# temporary storage.

3677

#

3678

# The supported resource type is:

3679

#

3680

# Google Cloud Storage:

3681

# storage.googleapis.com/{bucket}/{object}

3682

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3683

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3684

#

3685

# When workers access Google Cloud APIs, they logically do so via

3686

# relative URLs. If this field is specified, it supplies the base

3687

# URL to use for resolving these relative URLs. The normative

3688

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3689

# Locators".

3690

#

3691

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3692

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3693

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3694

# service will choose a number of threads (according to the number of cores

3695

# on the selected machine type for batch, or 1 by convention for streaming).

3696

"poolArgs": { # Extra arguments for this worker pool.

3697

"a_key": "", # Properties of the object. Contains field @type with type URL.

3698

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3699

"packages": [ # Packages to be installed on workers.

3700

{ # The packages that must be installed in order for a worker to run the

3701

# steps of the Cloud Dataflow job that will be assigned to its worker

3702

# pool.

3703

#

3704

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3705

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3706

# might use this to install jars containing the user's code and all of the

3707

# various dependencies (libraries, data files, etc.) required in order

3708

# for that code to run.

3709

"location": "A String", # The resource to read the package from. The supported resource type is:

3710

#

3711

# Google Cloud Storage:

3712

#

3713

# storage.googleapis.com/{bucket}

3714

# bucket.storage.googleapis.com/

3715

"name": "A String", # The name of the package.

3716

},

3717

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3718

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3719

# select a default set of packages which are useful to worker

3720

# harnesses written in a particular language.

3721

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3722

# are supported.

3723

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3724

# attempt to choose a reasonable default.

3725

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3726

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3727

# `TEARDOWN_NEVER`.

3728

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3729

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3730

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3731

# down.

3732

#

3733

# If the workers are not torn down by the service, they will

3734

# continue to run and use Google Compute Engine VM resources in the

3735

# user's project until they are explicitly terminated by the user.

3736

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3737

# policy except for small, manually supervised test jobs.

3738

#

3739

# If unknown or unspecified, the service will attempt to choose a reasonable

3740

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3741

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3742

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3743

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3744

# execute the job. If zero or unspecified, the service will

3745

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3746

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3747

# the form "regions/REGION/subnetworks/SUBNETWORK".

3748

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3749

{ # Describes the data disk used by a workflow job.

3750

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3751

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3752

# attempt to choose a reasonable default.

3753

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3754

# must be a disk type appropriate to the project and zone in which

3755

# the workers will run. If unknown or unspecified, the service

3756

# will attempt to choose a reasonable default.

3757

#

3758

# For example, the standard persistent disk type is a resource name

3759

# typically ending in "pd-standard". If SSD persistent disks are

3760

# available, the resource name typically ends with "pd-ssd". The

3761

# actual valid values are defined the Google Compute Engine API,

3762

# not by the Cloud Dataflow API; consult the Google Compute Engine

3763

# documentation for more information about determining the set of

3764

# available disk types for a particular project and zone.

3765

#

3766

# Google Compute Engine Disk types are local to a particular

3767

# project in a particular zone, and so the resource name will

3768

# typically look something like this:

3769

#

3770

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

3771

},

3772

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3773

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3774

# only be set in the Fn API path. For non-cross-language pipelines this

3775

# should have only one entry. Cross-language pipelines will have two or more

3776

# entries.

3777

{ # Defines a SDK harness container for executing Dataflow pipelines.

3778

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3779

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3780

# container instance with this image. If false (or unset) recommends using

3781

# more than one core per SDK container instance with this image for

3782

# efficiency. Note that Dataflow service may choose to override this property

3783

# if needed.

3784

},

3785

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3786

},

3787

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3788

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3789

# unspecified, the service will attempt to choose a reasonable

3790

# default. This should be in the form of the API service name,

3791

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3792

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3793

# storage. The system will append the suffix "/temp-{JOBNAME} to

3794

# this resource prefix, where {JOBNAME} is the value of the

3795

# job_name field. The resulting bucket and object prefix is used

3796

# as the prefix of the resources used to store temporary data

3797

# needed during the job execution. NOTE: This will override the

3798

# value in taskrunner_settings.

3799

# The supported resource type is:

3800

#

3801

# Google Cloud Storage:

3802

#

3803

# storage.googleapis.com/{bucket}/{object}

3804

# bucket.storage.googleapis.com/{object}

3805

},

3806

"location": "A String", # The [regional endpoint]

3807

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3808

# contains this job.

3809

"tempFiles": [ # A set of files the system should be aware of that are used

3810

# for temporary storage. These temporary files will be

3811

# removed on job completion.

3812

# No duplicates are allowed.

3813

# No file patterns are supported.

3814

#

3815

# The supported files are:

3816

#

3817

# Google Cloud Storage:

3818

#

3819

# storage.googleapis.com/{bucket}/{object}

3820

# bucket.storage.googleapis.com/{object}

3821

"A String",

3822

],

3823

"type": "A String", # The type of Cloud Dataflow job.

3824

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3825

# If this field is set, the service will ensure its uniqueness.

3826

# The request to create a job will fail if the service has knowledge of a

3827

# previously submitted job with the same client's ID and job name.

3828

# The caller may use this field to ensure idempotence of job

3829

# creation across retried attempts to create a job.

3830

# By default, the field is empty and, in that case, the service ignores it.

3831

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3832

# snapshot.

3833

"stepsLocation": "A String", # The GCS location where the steps are stored.

3834

"currentStateTime": "A String", # The timestamp associated with the current state.

3835

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3836

# Flexible resource scheduling jobs are started with some delay after job

3837

# creation, so start_time is unset before start and is updated when the

3838

# job is started by the Cloud Dataflow service. For other jobs, start_time

3839

# always equals to create_time and is immutable and set by the Cloud Dataflow

3840

# service.

3841

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3842

# Cloud Dataflow service.

3843

"requestedState": "A String", # The job's requested state.

3844

#

3845

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3846

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3847

# also be used to directly set a job's requested state to

3848

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3849

# job if it has not already reached a terminal state.

3850

"name": "A String", # The user-specified Cloud Dataflow job name.

3851

#

3852

# Only one Job with a given name may exist in a project at any

3853

# given time. If a caller attempts to create a Job with the same

3854

# name as an already-existing Job, the attempt returns the

3855

# existing Job.

3856

#

3857

# The name must match the regular expression

3858

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3859

"steps": [ # Exactly one of step or steps_location should be specified.

3860

#

3861

# The top-level steps that constitute the entire job.

3862

{ # Defines a particular step within a Cloud Dataflow job.

3863

#

3864

# A job consists of multiple steps, each of which performs some

3865

# specific operation as part of the overall job. Data is typically

3866

# passed from one step to another as part of the job.

3867

#

3868

# Here's an example of a sequence of steps which together implement a

3869

# Map-Reduce job:

3870

#

3871

# * Read a collection of data from some source, parsing the

3872

# collection's elements.

3873

#

3874

# * Validate the elements.

3875

#

3876

# * Apply a user-defined function to map each element to some value

3877

# and extract an element-specific key value.

3878

#

3879

# * Group elements with the same key into a single element with

3880

# that key, transforming a multiply-keyed collection into a

3881

# uniquely-keyed collection.

3882

#

3883

# * Write the elements out to some data sink.

3884

#

3885

# Note that the Cloud Dataflow service may be used to run many different

3886

# types of jobs, not just Map-Reduce.

3887

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3888

"name": "A String", # The name that identifies the step. This must be unique for each

3889

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3890

"properties": { # Named properties associated with the step. Each kind of

3891

# predefined step has its own required set of properties.

3892

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

3893

"a_key": "", # Properties of the object.

3894

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3895

},

3896

],

3897

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3898

# of the job it replaced.

3899

#

3900

# When sending a `CreateJobRequest`, you can update a job by specifying it

3901

# here. The job named here is stopped, and its intermediate state is

3902

# transferred to this job.

3903

"currentState": "A String", # The current state of the job.

3904

#

3905

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3906

# specified.

3907

#

3908

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3909

# terminal state. After a job has reached a terminal state, no

3910

# further state updates may be made.

3911

#

3912

# This field may be mutated by the Cloud Dataflow service;

3913

# callers cannot mutate it.

3914

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3915

# isn't contained in the submitted job.

3916

"stages": { # A mapping from each stage to the information about that stage.

3917

"a_key": { # Contains information about how a particular

3918

# google.dataflow.v1beta3.Step will be executed.

3919

"stepName": [ # The steps associated with the execution stage.

3920

# Note that stages may have several steps, and that a given step

3921

# might be run by more than one stage.

"A String",

],

},

},

},

}

location: string, The [regional endpoint]

3930

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3931

contains this job.

3932

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3939

3940

{ # Defines a job to be run by the Cloud Dataflow service.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3941

"labels": { # User-defined labels for this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3942

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3943

# The labels map can contain no more than 64 entries. Entries of the labels

3944

# map are UTF8 strings that comply with the following restrictions:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3945

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3946

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3947

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3948

# * Both keys and values are additionally constrained to be <= 128 bytes in

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3949

# size.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

3950

"a_key": "A String",

3951

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3952

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3953

# by the metadata values provided here. Populated for ListJobs and all GetJob

3954

# views SUMMARY and higher.

3955

# ListJob response and Job SUMMARY view.

3956

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3957

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3958

"version": "A String", # The version of the SDK used to run the job.

3959

"sdkSupportStatus": "A String", # The support status for this SDK version.

3960

},

3961

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3962

{ # Metadata for a PubSub connector used by the job.

3963

"topic": "A String", # Topic accessed in the connection.

3964

"subscription": "A String", # Subscription used in the connection.

3965

},

3966

],

3967

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3968

{ # Metadata for a Datastore connector used by the job.

3969

"projectId": "A String", # ProjectId accessed in the connection.

3970

"namespace": "A String", # Namespace used in the connection.

3971

},

3972

],

3973

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3974

{ # Metadata for a File connector used by the job.

3975

"filePattern": "A String", # File Pattern used to access files by the connector.

3976

},

3977

],

3978

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3979

{ # Metadata for a Spanner connector used by the job.

3980

"instanceId": "A String", # InstanceId accessed in the connection.

3981

"projectId": "A String", # ProjectId accessed in the connection.

3982

"databaseId": "A String", # DatabaseId accessed in the connection.

3983

},

3984

],

3985

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3986

{ # Metadata for a BigTable connector used by the job.

3987

"instanceId": "A String", # InstanceId accessed in the connection.

3988

"projectId": "A String", # ProjectId accessed in the connection.

3989

"tableId": "A String", # TableId accessed in the connection.

3990

},

3991

],

3992

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3993

{ # Metadata for a BigQuery connector used by the job.

3994

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3995

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

3996

"table": "A String", # Table accessed in the connection.

3997

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

4002

# A description of the user pipeline and stages through which it is executed.

4003

# Created by Cloud Dataflow service. Only retrieved with

4004

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

4005

# form. This data is provided by the Dataflow service for ease of visualizing

4006

# the pipeline and interpreting Dataflow provided metrics.

4007

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

4008

{ # Description of the type, names/ids, and input/outputs for a transform.

4009

"kind": "A String", # Type of transform.

4010

"name": "A String", # User provided name for this transform instance.

4011

"inputCollectionName": [ # User names for all collection inputs to this transform.

4012

"A String",

4013

],

4014

"displayData": [ # Transform-specific display data.

4015

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4016

"key": "A String", # The key identifying the display data.

4017

# This is intended to be used as a label for the display data

4018

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4019

"shortStrValue": "A String", # A possible additional shorter value to display.

4020

# For example a java_class_name_value of com.mypackage.MyDoFn

4021

# will be stored with MyDoFn as the short_str_value and

4022

# com.mypackage.MyDoFn as the java_class_name value.

4023

# short_str_value can be displayed and java_class_name_value

4024

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4025

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4026

"url": "A String", # An optional full URL.

4027

"floatValue": 3.14, # Contains value if the data is of float type.

4028

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

4029

# language namespace (i.e. python module) which defines the display data.

4030

# This allows a dax monitoring system to specially handle the data

4031

# and perform custom rendering.

4032

"javaClassValue": "A String", # Contains value if the data is of java class type.

4033

"label": "A String", # An optional label to display in a dax UI for the element.

4034

"boolValue": True or False, # Contains value if the data is of a boolean type.

4035

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4036

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4037

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4038

},

4039

],

4040

"outputCollectionName": [ # User names for all collection outputs to this transform.

4041

"A String",

4042

],

4043

"id": "A String", # SDK generated id of this transform instance.

4044

},

4045

],

4046

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

4047

{ # Description of the composing transforms, names/ids, and input/outputs of a

4048

# stage of execution. Some composing transforms and sources may have been

4049

# generated by the Dataflow service during execution planning.

4050

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

4051

{ # Description of an interstitial value between transforms in an execution

4052

# stage.

4053

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4054

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4055

# source is most closely associated.

4056

"name": "A String", # Dataflow service generated name for this source.

4057

},

4058

],

4059

"kind": "A String", # Type of tranform this stage is executing.

4060

"name": "A String", # Dataflow service generated name for this stage.

4061

"outputSource": [ # Output sources for this stage.

4062

{ # Description of an input or output of an execution stage.

4063

"userName": "A String", # Human-readable name for this source; may be user or system generated.

4064

"sizeBytes": "A String", # Size of the source, if measurable.

4065

"name": "A String", # Dataflow service generated name for this source.

4066

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4067

# source is most closely associated.

4068

},

4069

],

4070

"inputSource": [ # Input sources for this stage.

4071

{ # Description of an input or output of an execution stage.

4072

"userName": "A String", # Human-readable name for this source; may be user or system generated.

4073

"sizeBytes": "A String", # Size of the source, if measurable.

4074

"name": "A String", # Dataflow service generated name for this source.

4075

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4076

# source is most closely associated.

4077

},

4078

],

4079

"componentTransform": [ # Transforms that comprise this execution stage.

4080

{ # Description of a transform executed as part of an execution stage.

4081

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4082

"originalTransform": "A String", # User name for the original user transform with which this transform is

4083

# most closely associated.

4084

"name": "A String", # Dataflow service generated name for this source.

4085

},

4086

],

4087

"id": "A String", # Dataflow service generated id for this stage.

4088

},

4089

],

4090

"displayData": [ # Pipeline level display data.

4091

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4092

"key": "A String", # The key identifying the display data.

4093

# This is intended to be used as a label for the display data

4094

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4095

"shortStrValue": "A String", # A possible additional shorter value to display.

4096

# For example a java_class_name_value of com.mypackage.MyDoFn

4097

# will be stored with MyDoFn as the short_str_value and

4098

# com.mypackage.MyDoFn as the java_class_name value.

4099

# short_str_value can be displayed and java_class_name_value

4100

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4101

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4102

"url": "A String", # An optional full URL.

4103

"floatValue": 3.14, # Contains value if the data is of float type.

4104

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

4105

# language namespace (i.e. python module) which defines the display data.

4106

# This allows a dax monitoring system to specially handle the data

4107

# and perform custom rendering.

4108

"javaClassValue": "A String", # Contains value if the data is of java class type.

4109

"label": "A String", # An optional label to display in a dax UI for the element.

4110

"boolValue": True or False, # Contains value if the data is of a boolean type.

4111

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4112

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4113

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

4118

# callers cannot mutate it.

4119

{ # A message describing the state of a particular execution stage.

4120

"executionStageName": "A String", # The name of the execution stage.

4121

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

4122

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

4123

},

4124

],

4125

"id": "A String", # The unique ID of this job.

4126

#

4127

# This field is set by the Cloud Dataflow service when the Job is

4128

# created, and is immutable for the life of the job.

4129

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

4130

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

4131

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4132

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

4133

# corresponding name prefixes of the new job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4134

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

4135

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4136

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4137

"workerRegion": "A String", # The Compute Engine region

4138

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

4139

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

4140

# with worker_zone. If neither worker_region nor worker_zone is specified,

4141

# default to the control plane's region.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4142

"version": { # A structure describing which components and their versions of the service

4143

# are required in order to run the job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4144

"a_key": "", # Properties of the object.

4145

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4146

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

4147

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

4148

# at rest, AKA a Customer Managed Encryption Key (CMEK).

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4149

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4150

# Format:

4151

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4152

"internalExperiments": { # Experimental settings.

Jon Wayne Parrott

2016-08-16 12:44:29 -0700

[diff] [blame]

4153

"a_key": "", # Properties of the object. Contains field @type with type URL.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4154

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4155

"dataset": "A String", # The dataset for the current project where various workflow

4156

# related tables are stored.

4157

#

4158

# The supported resource type is:

4159

#

4160

# Google BigQuery:

4161

# bigquery.googleapis.com/{dataset}

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4162

"experiments": [ # The list of experiments to enable.

4163

"A String",

4164

],

Sai Cheemalapati

ea3a5e1

2016-10-12 14:05:53 -0700

[diff] [blame]

4165

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4166

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

4167

# options are passed through the service and are used to recreate the

4168

# SDK pipeline options on the worker in a language agnostic and platform

4169

# independent way.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4170

"a_key": "", # Properties of the object.

4171

},

4172

"userAgent": { # A description of the process that generated the request.

4173

"a_key": "", # Properties of the object.

4174

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4175

"workerZone": "A String", # The Compute Engine zone

4176

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

4177

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

4178

# with worker_region. If neither worker_region nor worker_zone is specified,

4179

# a zone in the control plane's region is chosen based on available capacity.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4180

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

4181

# specified in order for the job to have workers.

4182

{ # Describes one particular pool of Cloud Dataflow workers to be

4183

# instantiated by the Cloud Dataflow service in order to perform the

4184

# computations required by a job. Note that a workflow job may use

4185

# multiple pools, in order to match the various computational

4186

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4187

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

4188

# harness, residing in Google Container Registry.

4189

#

4190

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

4191

"ipConfiguration": "A String", # Configuration for VM IPs.

4192

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

4193

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

4194

"algorithm": "A String", # The algorithm to use for autoscaling.

4195

},

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4196

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4197

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

4198

# the service will use the network "default".

4199

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

4200

# will attempt to choose a reasonable default.

4201

"metadata": { # Metadata to set on the Google Compute Engine VMs.

4202

"a_key": "A String",

4203

},

4204

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

4205

# service will attempt to choose a reasonable default.

4206

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

4207

# Compute Engine API.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4208

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

4209

# using the standard Dataflow task runner. Users should ignore

4210

# this field.

4211

"workflowFileName": "A String", # The file to store the workflow in.

4212

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

4213

# will not be uploaded.

4214

#

4215

# The supported resource type is:

4216

#

4217

# Google Cloud Storage:

4218

# storage.googleapis.com/{bucket}/{object}

4219

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

4220

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4221

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

4222

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

4223

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

4224

"vmId": "A String", # The ID string of the VM.

4225

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

4226

# taskrunner; e.g. "wheel".

4227

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

4228

# taskrunner; e.g. "root".

4229

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

4230

# access the Cloud Dataflow API.

4231

"A String",

4232

],

4233

"languageHint": "A String", # The suggested backend language.

4234

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

4235

# console.

4236

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

4237

"logDir": "A String", # The directory on the VM to store logs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4238

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

4239

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

4240

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

4241

# "shuffle/v1beta1".

4242

"workerId": "A String", # The ID of the worker running this pipeline.

4243

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

4244

#

4245

# When workers access Google Cloud APIs, they logically do so via

4246

# relative URLs. If this field is specified, it supplies the base

4247

# URL to use for resolving these relative URLs. The normative

4248

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4249

# Locators".

4250

#

4251

# If not specified, the default value is "http://www.googleapis.com/"

4252

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

4253

# "dataflow/v1b3/projects".

4254

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

4255

# storage.

4256

#

4257

# The supported resource type is:

4258

#

4259

# Google Cloud Storage:

4260

#

4261

# storage.googleapis.com/{bucket}/{object}

4262

# bucket.storage.googleapis.com/{object}

4263

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4264

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

4265

"harnessCommand": "A String", # The command to launch the worker harness.

4266

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

4267

# temporary storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4268

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

4269

# The supported resource type is:

4270

#

4271

# Google Cloud Storage:

4272

# storage.googleapis.com/{bucket}/{object}

4273

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4274

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

4275

#

4276

# When workers access Google Cloud APIs, they logically do so via

4277

# relative URLs. If this field is specified, it supplies the base

4278

# URL to use for resolving these relative URLs. The normative

4279

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4280

# Locators".

4281

#

4282

# If not specified, the default value is "http://www.googleapis.com/"

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4283

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4284

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

4285

# service will choose a number of threads (according to the number of cores

4286

# on the selected machine type for batch, or 1 by convention for streaming).

4287

"poolArgs": { # Extra arguments for this worker pool.

4288

"a_key": "", # Properties of the object. Contains field @type with type URL.

4289

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4290

"packages": [ # Packages to be installed on workers.

4291

{ # The packages that must be installed in order for a worker to run the

4292

# steps of the Cloud Dataflow job that will be assigned to its worker

4293

# pool.

4294

#

4295

# This is the mechanism by which the Cloud Dataflow SDK causes code to

4296

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

4297

# might use this to install jars containing the user's code and all of the

4298

# various dependencies (libraries, data files, etc.) required in order

4299

# for that code to run.

4300

"location": "A String", # The resource to read the package from. The supported resource type is:

4301

#

4302

# Google Cloud Storage:

4303

#

4304

# storage.googleapis.com/{bucket}

4305

# bucket.storage.googleapis.com/

4306

"name": "A String", # The name of the package.

4307

},

4308

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4309

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

4310

# select a default set of packages which are useful to worker

4311

# harnesses written in a particular language.

4312

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

4313

# are supported.

4314

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4315

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4316

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

4317

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

4318

# `TEARDOWN_NEVER`.

4319

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

4320

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

4321

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

4322

# down.

4323

#

4324

# If the workers are not torn down by the service, they will

4325

# continue to run and use Google Compute Engine VM resources in the

4326

# user's project until they are explicitly terminated by the user.

4327

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

4328

# policy except for small, manually supervised test jobs.

4329

#

4330

# If unknown or unspecified, the service will attempt to choose a reasonable

4331

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4332

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

4333

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4334

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

4335

# execute the job. If zero or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4336

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4337

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

4338

# the form "regions/REGION/subnetworks/SUBNETWORK".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4339

"dataDisks": [ # Data disks that are used by a VM in this workflow.

4340

{ # Describes the data disk used by a workflow job.

4341

"mountPoint": "A String", # Directory in a VM where disk is mounted.

4342

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

4343

# attempt to choose a reasonable default.

4344

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

4345

# must be a disk type appropriate to the project and zone in which

4346

# the workers will run. If unknown or unspecified, the service

4347

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4348

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4349

# For example, the standard persistent disk type is a resource name

4350

# typically ending in "pd-standard". If SSD persistent disks are

4351

# available, the resource name typically ends with "pd-ssd". The

4352

# actual valid values are defined the Google Compute Engine API,

4353

# not by the Cloud Dataflow API; consult the Google Compute Engine

4354

# documentation for more information about determining the set of

4355

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4356

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4357

# Google Compute Engine Disk types are local to a particular

4358

# project in a particular zone, and so the resource name will

4359

# typically look something like this:

4360

#

4361

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4362

},

4363

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4364

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

4365

# only be set in the Fn API path. For non-cross-language pipelines this

4366

# should have only one entry. Cross-language pipelines will have two or more

4367

# entries.

4368

{ # Defines a SDK harness container for executing Dataflow pipelines.

4369

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

4370

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

4371

# container instance with this image. If false (or unset) recommends using

4372

# more than one core per SDK container instance with this image for

4373

# efficiency. Note that Dataflow service may choose to override this property

4374

# if needed.

4375

},

4376

],

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4377

},

4378

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4379

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

4380

# unspecified, the service will attempt to choose a reasonable

4381

# default. This should be in the form of the API service name,

4382

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4383

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

4384

# storage. The system will append the suffix "/temp-{JOBNAME} to

4385

# this resource prefix, where {JOBNAME} is the value of the

4386

# job_name field. The resulting bucket and object prefix is used

4387

# as the prefix of the resources used to store temporary data

4388

# needed during the job execution. NOTE: This will override the

4389

# value in taskrunner_settings.

4390

# The supported resource type is:

4391

#

4392

# Google Cloud Storage:

4393

#

4394

# storage.googleapis.com/{bucket}/{object}

4395

# bucket.storage.googleapis.com/{object}

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4396

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4397

"location": "A String", # The [regional endpoint]

4398

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

4399

# contains this job.

4400

"tempFiles": [ # A set of files the system should be aware of that are used

4401

# for temporary storage. These temporary files will be

4402

# removed on job completion.

4403

# No duplicates are allowed.

4404

# No file patterns are supported.

4405

#

4406

# The supported files are:

4407

#

4408

# Google Cloud Storage:

4409

#

4410

# storage.googleapis.com/{bucket}/{object}

4411

# bucket.storage.googleapis.com/{object}

4412

"A String",

4413

],

4414

"type": "A String", # The type of Cloud Dataflow job.

4415

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

4416

# If this field is set, the service will ensure its uniqueness.

4417

# The request to create a job will fail if the service has knowledge of a

4418

# previously submitted job with the same client's ID and job name.

4419

# The caller may use this field to ensure idempotence of job

4420

# creation across retried attempts to create a job.

4421

# By default, the field is empty and, in that case, the service ignores it.

4422

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

4423

# snapshot.

4424

"stepsLocation": "A String", # The GCS location where the steps are stored.

4425

"currentStateTime": "A String", # The timestamp associated with the current state.

4426

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

4427

# Flexible resource scheduling jobs are started with some delay after job

4428

# creation, so start_time is unset before start and is updated when the

4429

# job is started by the Cloud Dataflow service. For other jobs, start_time

4430

# always equals to create_time and is immutable and set by the Cloud Dataflow

4431

# service.

4432

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

4433

# Cloud Dataflow service.

4434

"requestedState": "A String", # The job's requested state.

4435

#

4436

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

4437

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

4438

# also be used to directly set a job's requested state to

4439

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

4440

# job if it has not already reached a terminal state.

4441

"name": "A String", # The user-specified Cloud Dataflow job name.

4442

#

4443

# Only one Job with a given name may exist in a project at any

4444

# given time. If a caller attempts to create a Job with the same

4445

# name as an already-existing Job, the attempt returns the

4446

# existing Job.

4447

#

4448

# The name must match the regular expression

4449

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

4450

"steps": [ # Exactly one of step or steps_location should be specified.

4451

#

4452

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4453

{ # Defines a particular step within a Cloud Dataflow job.

4454

#

4455

# A job consists of multiple steps, each of which performs some

4456

# specific operation as part of the overall job. Data is typically

4457

# passed from one step to another as part of the job.

4458

#

4459

# Here's an example of a sequence of steps which together implement a

4460

# Map-Reduce job:

4461

#

4462

# * Read a collection of data from some source, parsing the

4463

# collection's elements.

4464

#

4465

# * Validate the elements.

4466

#

4467

# * Apply a user-defined function to map each element to some value

4468

# and extract an element-specific key value.

4469

#

4470

# * Group elements with the same key into a single element with

4471

# that key, transforming a multiply-keyed collection into a

4472

# uniquely-keyed collection.

4473

#

4474

# * Write the elements out to some data sink.

4475

#

4476

# Note that the Cloud Dataflow service may be used to run many different

4477

# types of jobs, not just Map-Reduce.

4478

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame^]

4479

"name": "A String", # The name that identifies the step. This must be unique for each

4480

# step with respect to all other steps in the Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4481

"properties": { # Named properties associated with the step. Each kind of

4482

# predefined step has its own required set of properties.

4483

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4484

"a_key": "", # Properties of the object.

4485

},

4486

},

4487

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

4488

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

4489

# of the job it replaced.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4490

#

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

4491

# When sending a `CreateJobRequest`, you can update a job by specifying it

4492

# here. The job named here is stopped, and its intermediate state is

4493

# transferred to this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4494

"currentState": "A String", # The current state of the job.

4495

#

4496

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

4497

# specified.

4498

#

4499

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

4500

# terminal state. After a job has reached a terminal state, no

4501

# further state updates may be made.

4502

#

4503

# This field may be mutated by the Cloud Dataflow service;

4504

# callers cannot mutate it.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4505

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

4506

# isn't contained in the submitted job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4507

"stages": { # A mapping from each stage to the information about that stage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4508

"a_key": { # Contains information about how a particular

4509

# google.dataflow.v1beta3.Step will be executed.

4510

"stepName": [ # The steps associated with the execution stage.

4511

# Note that stages may have several steps, and that a given step

4512

# might be run by more than one stage.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

4513

"A String",

4514

],

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

4515

},

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

4516

},

4517

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4518

}</pre>

Nathaniel Manista