Blame - docs/dyn/dataflow_v1b3.projects.locations.jobs.html - platform/external/python/google-api-python-client

2017-01-06 09:58:29 -0800

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.locations.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.locations.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

88

<code><a href="dataflow_v1b3.projects.locations.jobs.snapshots.html">snapshots()</a></code>

89

</p>

90

<p class="firstline">Returns the snapshots Resource.</p>

91

92

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

93

<code><a href="dataflow_v1b3.projects.locations.jobs.workItems.html">workItems()</a></code>

94

</p>

95

<p class="firstline">Returns the workItems Resource.</p>

96

97

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

98

<code><a href="#create">create(projectId, location, body=None, x__xgafv=None, replaceJobId=None, view=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

99

<p class="firstline">Creates a Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

100

101

<code><a href="#get">get(projectId, location, jobId, x__xgafv=None, view=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

102

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

103

104

<code><a href="#getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</a></code></p>

105

<p class="firstline">Request the job status.</p>

106

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

107

<code><a href="#list">list(projectId, location, pageSize=None, pageToken=None, x__xgafv=None, filter=None, view=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

108

<p class="firstline">List the jobs of a project.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

109

110

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

111

<p class="firstline">Retrieves the next page of results.</p>

112

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

113

<code><a href="#snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

114

<p class="firstline">Snapshot the state of a streaming job.</p>

115

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

116

<code><a href="#update">update(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

117

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

118

<h3>Method Details</h3>

119

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

120

<code class="details" id="create">create(projectId, location, body=None, x__xgafv=None, replaceJobId=None, view=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

121

<pre>Creates a Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

122

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

123

To create a job, we recommend using `projects.locations.jobs.create` with a

124

[regional endpoint]

125

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

126

`projects.jobs.create` is not recommended, as your job will always start

127

in `us-central1`.

128

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

129

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

130

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

131

location: string, The [regional endpoint]

132

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

133

contains this job. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

134

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

135

The object takes the form of:

136

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

137

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

138

"labels": { # User-defined labels for this job.

139

#

140

# The labels map can contain no more than 64 entries. Entries of the labels

141

# map are UTF8 strings that comply with the following restrictions:

142

#

143

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

144

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

145

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

150

# by the metadata values provided here. Populated for ListJobs and all GetJob

151

# views SUMMARY and higher.

152

# ListJob response and Job SUMMARY view.

153

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

154

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

155

"version": "A String", # The version of the SDK used to run the job.

156

"sdkSupportStatus": "A String", # The support status for this SDK version.

157

},

158

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

159

{ # Metadata for a PubSub connector used by the job.

160

"topic": "A String", # Topic accessed in the connection.

161

"subscription": "A String", # Subscription used in the connection.

162

},

163

],

164

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

165

{ # Metadata for a Datastore connector used by the job.

166

"projectId": "A String", # ProjectId accessed in the connection.

167

"namespace": "A String", # Namespace used in the connection.

168

},

169

],

170

"fileDetails": [ # Identification of a File source used in the Dataflow job.

171

{ # Metadata for a File connector used by the job.

172

"filePattern": "A String", # File Pattern used to access files by the connector.

173

},

174

],

175

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

176

{ # Metadata for a Spanner connector used by the job.

177

"instanceId": "A String", # InstanceId accessed in the connection.

178

"projectId": "A String", # ProjectId accessed in the connection.

179

"databaseId": "A String", # DatabaseId accessed in the connection.

180

},

181

],

182

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

183

{ # Metadata for a BigTable connector used by the job.

184

"instanceId": "A String", # InstanceId accessed in the connection.

185

"projectId": "A String", # ProjectId accessed in the connection.

186

"tableId": "A String", # TableId accessed in the connection.

187

},

188

],

189

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

190

{ # Metadata for a BigQuery connector used by the job.

191

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

192

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

193

"table": "A String", # Table accessed in the connection.

194

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

199

# A description of the user pipeline and stages through which it is executed.

200

# Created by Cloud Dataflow service. Only retrieved with

201

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

202

# form. This data is provided by the Dataflow service for ease of visualizing

203

# the pipeline and interpreting Dataflow provided metrics.

204

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

205

{ # Description of the type, names/ids, and input/outputs for a transform.

206

"kind": "A String", # Type of transform.

207

"name": "A String", # User provided name for this transform instance.

208

"inputCollectionName": [ # User names for all collection inputs to this transform.

209

"A String",

210

],

211

"displayData": [ # Transform-specific display data.

212

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

213

"key": "A String", # The key identifying the display data.

214

# This is intended to be used as a label for the display data

215

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

216

"shortStrValue": "A String", # A possible additional shorter value to display.

217

# For example a java_class_name_value of com.mypackage.MyDoFn

218

# will be stored with MyDoFn as the short_str_value and

219

# com.mypackage.MyDoFn as the java_class_name value.

220

# short_str_value can be displayed and java_class_name_value

221

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

222

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

223

"url": "A String", # An optional full URL.

224

"floatValue": 3.14, # Contains value if the data is of float type.

225

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

226

# language namespace (i.e. python module) which defines the display data.

227

# This allows a dax monitoring system to specially handle the data

228

# and perform custom rendering.

229

"javaClassValue": "A String", # Contains value if the data is of java class type.

230

"label": "A String", # An optional label to display in a dax UI for the element.

231

"boolValue": True or False, # Contains value if the data is of a boolean type.

232

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

233

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

234

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

235

},

236

],

237

"outputCollectionName": [ # User names for all collection outputs to this transform.

238

"A String",

239

],

240

"id": "A String", # SDK generated id of this transform instance.

241

},

242

],

243

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

244

{ # Description of the composing transforms, names/ids, and input/outputs of a

245

# stage of execution. Some composing transforms and sources may have been

246

# generated by the Dataflow service during execution planning.

247

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

248

{ # Description of an interstitial value between transforms in an execution

249

# stage.

250

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

251

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

252

# source is most closely associated.

253

"name": "A String", # Dataflow service generated name for this source.

254

},

255

],

256

"kind": "A String", # Type of tranform this stage is executing.

257

"name": "A String", # Dataflow service generated name for this stage.

258

"outputSource": [ # Output sources for this stage.

259

{ # Description of an input or output of an execution stage.

260

"userName": "A String", # Human-readable name for this source; may be user or system generated.

261

"sizeBytes": "A String", # Size of the source, if measurable.

262

"name": "A String", # Dataflow service generated name for this source.

263

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

264

# source is most closely associated.

265

},

266

],

267

"inputSource": [ # Input sources for this stage.

268

{ # Description of an input or output of an execution stage.

269

"userName": "A String", # Human-readable name for this source; may be user or system generated.

270

"sizeBytes": "A String", # Size of the source, if measurable.

271

"name": "A String", # Dataflow service generated name for this source.

272

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

273

# source is most closely associated.

274

},

275

],

276

"componentTransform": [ # Transforms that comprise this execution stage.

277

{ # Description of a transform executed as part of an execution stage.

278

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

279

"originalTransform": "A String", # User name for the original user transform with which this transform is

280

# most closely associated.

281

"name": "A String", # Dataflow service generated name for this source.

282

},

283

],

284

"id": "A String", # Dataflow service generated id for this stage.

285

},

286

],

287

"displayData": [ # Pipeline level display data.

288

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

289

"key": "A String", # The key identifying the display data.

290

# This is intended to be used as a label for the display data

291

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

292

"shortStrValue": "A String", # A possible additional shorter value to display.

293

# For example a java_class_name_value of com.mypackage.MyDoFn

294

# will be stored with MyDoFn as the short_str_value and

295

# com.mypackage.MyDoFn as the java_class_name value.

296

# short_str_value can be displayed and java_class_name_value

297

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

298

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

299

"url": "A String", # An optional full URL.

300

"floatValue": 3.14, # Contains value if the data is of float type.

301

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

302

# language namespace (i.e. python module) which defines the display data.

303

# This allows a dax monitoring system to specially handle the data

304

# and perform custom rendering.

305

"javaClassValue": "A String", # Contains value if the data is of java class type.

306

"label": "A String", # An optional label to display in a dax UI for the element.

307

"boolValue": True or False, # Contains value if the data is of a boolean type.

308

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

309

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

310

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

315

# callers cannot mutate it.

316

{ # A message describing the state of a particular execution stage.

317

"executionStageName": "A String", # The name of the execution stage.

318

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

319

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

320

},

321

],

322

"id": "A String", # The unique ID of this job.

323

#

324

# This field is set by the Cloud Dataflow service when the Job is

325

# created, and is immutable for the life of the job.

326

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

327

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

328

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

329

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

330

# corresponding name prefixes of the new job.

331

"a_key": "A String",

332

},

333

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

334

"workerRegion": "A String", # The Compute Engine region

335

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

336

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

337

# with worker_zone. If neither worker_region nor worker_zone is specified,

338

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

339

"version": { # A structure describing which components and their versions of the service

340

# are required in order to run the job.

341

"a_key": "", # Properties of the object.

342

},

343

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

344

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

345

# at rest, AKA a Customer Managed Encryption Key (CMEK).

346

#

347

# Format:

348

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

349

"internalExperiments": { # Experimental settings.

350

"a_key": "", # Properties of the object. Contains field @type with type URL.

351

},

352

"dataset": "A String", # The dataset for the current project where various workflow

353

# related tables are stored.

354

#

355

# The supported resource type is:

356

#

357

# Google BigQuery:

358

# bigquery.googleapis.com/{dataset}

359

"experiments": [ # The list of experiments to enable.

360

"A String",

361

],

362

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

363

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

364

# options are passed through the service and are used to recreate the

365

# SDK pipeline options on the worker in a language agnostic and platform

366

# independent way.

367

"a_key": "", # Properties of the object.

368

},

369

"userAgent": { # A description of the process that generated the request.

370

"a_key": "", # Properties of the object.

371

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

372

"workerZone": "A String", # The Compute Engine zone

373

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

374

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

375

# with worker_region. If neither worker_region nor worker_zone is specified,

376

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

377

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

378

# specified in order for the job to have workers.

379

{ # Describes one particular pool of Cloud Dataflow workers to be

380

# instantiated by the Cloud Dataflow service in order to perform the

381

# computations required by a job. Note that a workflow job may use

382

# multiple pools, in order to match the various computational

383

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

384

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

385

# harness, residing in Google Container Registry.

386

#

387

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

388

"ipConfiguration": "A String", # Configuration for VM IPs.

389

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

390

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

391

"algorithm": "A String", # The algorithm to use for autoscaling.

392

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

393

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

394

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

395

# the service will use the network "default".

396

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

397

# will attempt to choose a reasonable default.

398

"metadata": { # Metadata to set on the Google Compute Engine VMs.

399

"a_key": "A String",

400

},

401

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

402

# service will attempt to choose a reasonable default.

403

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

404

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

405

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

406

# using the standard Dataflow task runner. Users should ignore

407

# this field.

408

"workflowFileName": "A String", # The file to store the workflow in.

409

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

410

# will not be uploaded.

411

#

412

# The supported resource type is:

413

#

414

# Google Cloud Storage:

415

# storage.googleapis.com/{bucket}/{object}

416

# bucket.storage.googleapis.com/{object}

417

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

418

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

419

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

420

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

421

"vmId": "A String", # The ID string of the VM.

422

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

423

# taskrunner; e.g. "wheel".

424

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

425

# taskrunner; e.g. "root".

426

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

427

# access the Cloud Dataflow API.

428

"A String",

429

],

430

"languageHint": "A String", # The suggested backend language.

431

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

432

# console.

433

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

434

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

435

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

436

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

437

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

438

# "shuffle/v1beta1".

439

"workerId": "A String", # The ID of the worker running this pipeline.

440

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

441

#

442

# When workers access Google Cloud APIs, they logically do so via

443

# relative URLs. If this field is specified, it supplies the base

444

# URL to use for resolving these relative URLs. The normative

445

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

446

# Locators".

447

#

448

# If not specified, the default value is "http://www.googleapis.com/"

449

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

450

# "dataflow/v1b3/projects".

451

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

452

# storage.

453

#

454

# The supported resource type is:

455

#

456

# Google Cloud Storage:

457

#

458

# storage.googleapis.com/{bucket}/{object}

459

# bucket.storage.googleapis.com/{object}

460

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

461

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

462

"harnessCommand": "A String", # The command to launch the worker harness.

463

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

464

# temporary storage.

465

#

466

# The supported resource type is:

467

#

468

# Google Cloud Storage:

469

# storage.googleapis.com/{bucket}/{object}

470

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

471

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

472

#

473

# When workers access Google Cloud APIs, they logically do so via

474

# relative URLs. If this field is specified, it supplies the base

475

# URL to use for resolving these relative URLs. The normative

476

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

477

# Locators".

478

#

479

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

480

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

481

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

482

# service will choose a number of threads (according to the number of cores

483

# on the selected machine type for batch, or 1 by convention for streaming).

484

"poolArgs": { # Extra arguments for this worker pool.

485

"a_key": "", # Properties of the object. Contains field @type with type URL.

486

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

487

"packages": [ # Packages to be installed on workers.

488

{ # The packages that must be installed in order for a worker to run the

489

# steps of the Cloud Dataflow job that will be assigned to its worker

490

# pool.

491

#

492

# This is the mechanism by which the Cloud Dataflow SDK causes code to

493

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

494

# might use this to install jars containing the user's code and all of the

495

# various dependencies (libraries, data files, etc.) required in order

496

# for that code to run.

497

"location": "A String", # The resource to read the package from. The supported resource type is:

498

#

499

# Google Cloud Storage:

500

#

501

# storage.googleapis.com/{bucket}

502

# bucket.storage.googleapis.com/

503

"name": "A String", # The name of the package.

504

},

505

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

506

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

507

# select a default set of packages which are useful to worker

508

# harnesses written in a particular language.

509

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

510

# are supported.

511

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

512

# attempt to choose a reasonable default.

513

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

514

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

515

# `TEARDOWN_NEVER`.

516

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

517

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

518

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

519

# down.

520

#

521

# If the workers are not torn down by the service, they will

522

# continue to run and use Google Compute Engine VM resources in the

523

# user's project until they are explicitly terminated by the user.

524

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

525

# policy except for small, manually supervised test jobs.

526

#

527

# If unknown or unspecified, the service will attempt to choose a reasonable

528

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

529

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

530

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

531

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

532

# execute the job. If zero or unspecified, the service will

533

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

534

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

535

# the form "regions/REGION/subnetworks/SUBNETWORK".

536

"dataDisks": [ # Data disks that are used by a VM in this workflow.

537

{ # Describes the data disk used by a workflow job.

538

"mountPoint": "A String", # Directory in a VM where disk is mounted.

539

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

540

# attempt to choose a reasonable default.

541

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

542

# must be a disk type appropriate to the project and zone in which

543

# the workers will run. If unknown or unspecified, the service

544

# will attempt to choose a reasonable default.

545

#

546

# For example, the standard persistent disk type is a resource name

547

# typically ending in "pd-standard". If SSD persistent disks are

548

# available, the resource name typically ends with "pd-ssd". The

549

# actual valid values are defined the Google Compute Engine API,

550

# not by the Cloud Dataflow API; consult the Google Compute Engine

551

# documentation for more information about determining the set of

552

# available disk types for a particular project and zone.

553

#

554

# Google Compute Engine Disk types are local to a particular

555

# project in a particular zone, and so the resource name will

556

# typically look something like this:

557

#

558

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

559

},

560

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

561

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

562

# only be set in the Fn API path. For non-cross-language pipelines this

563

# should have only one entry. Cross-language pipelines will have two or more

564

# entries.

565

{ # Defines a SDK harness container for executing Dataflow pipelines.

566

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

567

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

568

# container instance with this image. If false (or unset) recommends using

569

# more than one core per SDK container instance with this image for

570

# efficiency. Note that Dataflow service may choose to override this property

571

# if needed.

572

},

573

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

574

},

575

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

576

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

577

# unspecified, the service will attempt to choose a reasonable

578

# default. This should be in the form of the API service name,

579

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

580

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

581

# storage. The system will append the suffix "/temp-{JOBNAME} to

582

# this resource prefix, where {JOBNAME} is the value of the

583

# job_name field. The resulting bucket and object prefix is used

584

# as the prefix of the resources used to store temporary data

585

# needed during the job execution. NOTE: This will override the

586

# value in taskrunner_settings.

587

# The supported resource type is:

588

#

589

# Google Cloud Storage:

590

#

591

# storage.googleapis.com/{bucket}/{object}

592

# bucket.storage.googleapis.com/{object}

593

},

594

"location": "A String", # The [regional endpoint]

595

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

596

# contains this job.

597

"tempFiles": [ # A set of files the system should be aware of that are used

598

# for temporary storage. These temporary files will be

599

# removed on job completion.

600

# No duplicates are allowed.

601

# No file patterns are supported.

602

#

603

# The supported files are:

604

#

605

# Google Cloud Storage:

606

#

607

# storage.googleapis.com/{bucket}/{object}

608

# bucket.storage.googleapis.com/{object}

609

"A String",

610

],

611

"type": "A String", # The type of Cloud Dataflow job.

612

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

613

# If this field is set, the service will ensure its uniqueness.

614

# The request to create a job will fail if the service has knowledge of a

615

# previously submitted job with the same client's ID and job name.

616

# The caller may use this field to ensure idempotence of job

617

# creation across retried attempts to create a job.

618

# By default, the field is empty and, in that case, the service ignores it.

619

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

620

# snapshot.

621

"stepsLocation": "A String", # The GCS location where the steps are stored.

622

"currentStateTime": "A String", # The timestamp associated with the current state.

623

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

624

# Flexible resource scheduling jobs are started with some delay after job

625

# creation, so start_time is unset before start and is updated when the

626

# job is started by the Cloud Dataflow service. For other jobs, start_time

627

# always equals to create_time and is immutable and set by the Cloud Dataflow

628

# service.

629

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

630

# Cloud Dataflow service.

631

"requestedState": "A String", # The job's requested state.

632

#

633

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

634

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

635

# also be used to directly set a job's requested state to

636

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

637

# job if it has not already reached a terminal state.

638

"name": "A String", # The user-specified Cloud Dataflow job name.

639

#

640

# Only one Job with a given name may exist in a project at any

641

# given time. If a caller attempts to create a Job with the same

642

# name as an already-existing Job, the attempt returns the

643

# existing Job.

644

#

645

# The name must match the regular expression

646

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

647

"steps": [ # Exactly one of step or steps_location should be specified.

648

#

649

# The top-level steps that constitute the entire job.

650

{ # Defines a particular step within a Cloud Dataflow job.

651

#

652

# A job consists of multiple steps, each of which performs some

653

# specific operation as part of the overall job. Data is typically

654

# passed from one step to another as part of the job.

655

#

656

# Here's an example of a sequence of steps which together implement a

657

# Map-Reduce job:

658

#

659

# * Read a collection of data from some source, parsing the

660

# collection's elements.

661

#

662

# * Validate the elements.

663

#

664

# * Apply a user-defined function to map each element to some value

665

# and extract an element-specific key value.

666

#

667

# * Group elements with the same key into a single element with

668

# that key, transforming a multiply-keyed collection into a

669

# uniquely-keyed collection.

670

#

671

# * Write the elements out to some data sink.

672

#

673

# Note that the Cloud Dataflow service may be used to run many different

674

# types of jobs, not just Map-Reduce.

675

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

676

"name": "A String", # The name that identifies the step. This must be unique for each

677

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

678

"properties": { # Named properties associated with the step. Each kind of

679

# predefined step has its own required set of properties.

680

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

681

"a_key": "", # Properties of the object.

682

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

683

},

684

],

685

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

686

# of the job it replaced.

687

#

688

# When sending a `CreateJobRequest`, you can update a job by specifying it

689

# here. The job named here is stopped, and its intermediate state is

690

# transferred to this job.

691

"currentState": "A String", # The current state of the job.

692

#

693

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

694

# specified.

695

#

696

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

697

# terminal state. After a job has reached a terminal state, no

698

# further state updates may be made.

699

#

700

# This field may be mutated by the Cloud Dataflow service;

701

# callers cannot mutate it.

702

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

703

# isn't contained in the submitted job.

704

"stages": { # A mapping from each stage to the information about that stage.

705

"a_key": { # Contains information about how a particular

706

# google.dataflow.v1beta3.Step will be executed.

707

"stepName": [ # The steps associated with the execution stage.

708

# Note that stages may have several steps, and that a given step

709

# might be run by more than one stage.

"A String",

],

},

},

},

}

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

replaceJobId: string, Deprecated. This field is now in the Job message.

722

view: string, The level of information requested in response.

723

724

Returns:

725

An object of the form:

726

727

{ # Defines a job to be run by the Cloud Dataflow service.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

728

"labels": { # User-defined labels for this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

729

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

730

# The labels map can contain no more than 64 entries. Entries of the labels

731

# map are UTF8 strings that comply with the following restrictions:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

732

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

733

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

734

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

735

# * Both keys and values are additionally constrained to be <= 128 bytes in

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

736

# size.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

737

"a_key": "A String",

738

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

739

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

740

# by the metadata values provided here. Populated for ListJobs and all GetJob

741

# views SUMMARY and higher.

742

# ListJob response and Job SUMMARY view.

743

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

744

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

745

"version": "A String", # The version of the SDK used to run the job.

746

"sdkSupportStatus": "A String", # The support status for this SDK version.

747

},

748

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

749

{ # Metadata for a PubSub connector used by the job.

750

"topic": "A String", # Topic accessed in the connection.

751

"subscription": "A String", # Subscription used in the connection.

752

},

753

],

754

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

755

{ # Metadata for a Datastore connector used by the job.

756

"projectId": "A String", # ProjectId accessed in the connection.

757

"namespace": "A String", # Namespace used in the connection.

758

},

759

],

760

"fileDetails": [ # Identification of a File source used in the Dataflow job.

761

{ # Metadata for a File connector used by the job.

762

"filePattern": "A String", # File Pattern used to access files by the connector.

763

},

764

],

765

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

766

{ # Metadata for a Spanner connector used by the job.

767

"instanceId": "A String", # InstanceId accessed in the connection.

768

"projectId": "A String", # ProjectId accessed in the connection.

769

"databaseId": "A String", # DatabaseId accessed in the connection.

770

},

771

],

772

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

773

{ # Metadata for a BigTable connector used by the job.

774

"instanceId": "A String", # InstanceId accessed in the connection.

775

"projectId": "A String", # ProjectId accessed in the connection.

776

"tableId": "A String", # TableId accessed in the connection.

777

},

778

],

779

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

780

{ # Metadata for a BigQuery connector used by the job.

781

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

782

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

783

"table": "A String", # Table accessed in the connection.

784

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

789

# A description of the user pipeline and stages through which it is executed.

790

# Created by Cloud Dataflow service. Only retrieved with

791

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

792

# form. This data is provided by the Dataflow service for ease of visualizing

793

# the pipeline and interpreting Dataflow provided metrics.

794

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

795

{ # Description of the type, names/ids, and input/outputs for a transform.

796

"kind": "A String", # Type of transform.

797

"name": "A String", # User provided name for this transform instance.

798

"inputCollectionName": [ # User names for all collection inputs to this transform.

799

"A String",

800

],

801

"displayData": [ # Transform-specific display data.

802

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

803

"key": "A String", # The key identifying the display data.

804

# This is intended to be used as a label for the display data

805

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

806

"shortStrValue": "A String", # A possible additional shorter value to display.

807

# For example a java_class_name_value of com.mypackage.MyDoFn

808

# will be stored with MyDoFn as the short_str_value and

809

# com.mypackage.MyDoFn as the java_class_name value.

810

# short_str_value can be displayed and java_class_name_value

811

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

812

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

813

"url": "A String", # An optional full URL.

814

"floatValue": 3.14, # Contains value if the data is of float type.

815

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

816

# language namespace (i.e. python module) which defines the display data.

817

# This allows a dax monitoring system to specially handle the data

818

# and perform custom rendering.

819

"javaClassValue": "A String", # Contains value if the data is of java class type.

820

"label": "A String", # An optional label to display in a dax UI for the element.

821

"boolValue": True or False, # Contains value if the data is of a boolean type.

822

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

823

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

824

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

825

},

826

],

827

"outputCollectionName": [ # User names for all collection outputs to this transform.

828

"A String",

829

],

830

"id": "A String", # SDK generated id of this transform instance.

831

},

832

],

833

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

834

{ # Description of the composing transforms, names/ids, and input/outputs of a

835

# stage of execution. Some composing transforms and sources may have been

836

# generated by the Dataflow service during execution planning.

837

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

838

{ # Description of an interstitial value between transforms in an execution

839

# stage.

840

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

841

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

842

# source is most closely associated.

843

"name": "A String", # Dataflow service generated name for this source.

844

},

845

],

846

"kind": "A String", # Type of tranform this stage is executing.

847

"name": "A String", # Dataflow service generated name for this stage.

848

"outputSource": [ # Output sources for this stage.

849

{ # Description of an input or output of an execution stage.

850

"userName": "A String", # Human-readable name for this source; may be user or system generated.

851

"sizeBytes": "A String", # Size of the source, if measurable.

852

"name": "A String", # Dataflow service generated name for this source.

853

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

854

# source is most closely associated.

855

},

856

],

857

"inputSource": [ # Input sources for this stage.

858

{ # Description of an input or output of an execution stage.

859

"userName": "A String", # Human-readable name for this source; may be user or system generated.

860

"sizeBytes": "A String", # Size of the source, if measurable.

861

"name": "A String", # Dataflow service generated name for this source.

862

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

863

# source is most closely associated.

864

},

865

],

866

"componentTransform": [ # Transforms that comprise this execution stage.

867

{ # Description of a transform executed as part of an execution stage.

868

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

869

"originalTransform": "A String", # User name for the original user transform with which this transform is

870

# most closely associated.

871

"name": "A String", # Dataflow service generated name for this source.

872

},

873

],

874

"id": "A String", # Dataflow service generated id for this stage.

875

},

876

],

877

"displayData": [ # Pipeline level display data.

878

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

879

"key": "A String", # The key identifying the display data.

880

# This is intended to be used as a label for the display data

881

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

882

"shortStrValue": "A String", # A possible additional shorter value to display.

883

# For example a java_class_name_value of com.mypackage.MyDoFn

884

# will be stored with MyDoFn as the short_str_value and

885

# com.mypackage.MyDoFn as the java_class_name value.

886

# short_str_value can be displayed and java_class_name_value

887

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

888

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

889

"url": "A String", # An optional full URL.

890

"floatValue": 3.14, # Contains value if the data is of float type.

891

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

892

# language namespace (i.e. python module) which defines the display data.

893

# This allows a dax monitoring system to specially handle the data

894

# and perform custom rendering.

895

"javaClassValue": "A String", # Contains value if the data is of java class type.

896

"label": "A String", # An optional label to display in a dax UI for the element.

897

"boolValue": True or False, # Contains value if the data is of a boolean type.

898

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

899

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

900

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

905

# callers cannot mutate it.

906

{ # A message describing the state of a particular execution stage.

907

"executionStageName": "A String", # The name of the execution stage.

908

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

909

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

910

},

911

],

912

"id": "A String", # The unique ID of this job.

913

#

914

# This field is set by the Cloud Dataflow service when the Job is

915

# created, and is immutable for the life of the job.

916

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

917

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

918

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

919

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

920

# corresponding name prefixes of the new job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

921

"a_key": "A String",

922

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

923

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

924

"workerRegion": "A String", # The Compute Engine region

925

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

926

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

927

# with worker_zone. If neither worker_region nor worker_zone is specified,

928

# default to the control plane's region.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

929

"version": { # A structure describing which components and their versions of the service

930

# are required in order to run the job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

931

"a_key": "", # Properties of the object.

932

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

933

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

934

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

935

# at rest, AKA a Customer Managed Encryption Key (CMEK).

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

936

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

937

# Format:

938

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

939

"internalExperiments": { # Experimental settings.

940

"a_key": "", # Properties of the object. Contains field @type with type URL.

941

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

942

"dataset": "A String", # The dataset for the current project where various workflow

943

# related tables are stored.

944

#

945

# The supported resource type is:

946

#

947

# Google BigQuery:

948

# bigquery.googleapis.com/{dataset}

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

949

"experiments": [ # The list of experiments to enable.

950

"A String",

951

],

952

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

953

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

954

# options are passed through the service and are used to recreate the

955

# SDK pipeline options on the worker in a language agnostic and platform

956

# independent way.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

957

"a_key": "", # Properties of the object.

958

},

959

"userAgent": { # A description of the process that generated the request.

960

"a_key": "", # Properties of the object.

961

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

962

"workerZone": "A String", # The Compute Engine zone

963

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

964

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

965

# with worker_region. If neither worker_region nor worker_zone is specified,

966

# a zone in the control plane's region is chosen based on available capacity.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

967

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

968

# specified in order for the job to have workers.

969

{ # Describes one particular pool of Cloud Dataflow workers to be

970

# instantiated by the Cloud Dataflow service in order to perform the

971

# computations required by a job. Note that a workflow job may use

972

# multiple pools, in order to match the various computational

973

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

974

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

975

# harness, residing in Google Container Registry.

976

#

977

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

978

"ipConfiguration": "A String", # Configuration for VM IPs.

979

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

980

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

981

"algorithm": "A String", # The algorithm to use for autoscaling.

982

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

983

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

984

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

985

# the service will use the network "default".

986

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

987

# will attempt to choose a reasonable default.

988

"metadata": { # Metadata to set on the Google Compute Engine VMs.

989

"a_key": "A String",

990

},

991

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

992

# service will attempt to choose a reasonable default.

993

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

994

# Compute Engine API.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

995

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

996

# using the standard Dataflow task runner. Users should ignore

997

# this field.

998

"workflowFileName": "A String", # The file to store the workflow in.

999

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1000

# will not be uploaded.

1001

#

1002

# The supported resource type is:

1003

#

1004

# Google Cloud Storage:

1005

# storage.googleapis.com/{bucket}/{object}

1006

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

1007

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1008

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1009

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1010

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1011

"vmId": "A String", # The ID string of the VM.

1012

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1013

# taskrunner; e.g. "wheel".

1014

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1015

# taskrunner; e.g. "root".

1016

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1017

# access the Cloud Dataflow API.

1018

"A String",

1019

],

1020

"languageHint": "A String", # The suggested backend language.

1021

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1022

# console.

1023

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1024

"logDir": "A String", # The directory on the VM to store logs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1025

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1026

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1027

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1028

# "shuffle/v1beta1".

1029

"workerId": "A String", # The ID of the worker running this pipeline.

1030

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1031

#

1032

# When workers access Google Cloud APIs, they logically do so via

1033

# relative URLs. If this field is specified, it supplies the base

1034

# URL to use for resolving these relative URLs. The normative

1035

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1036

# Locators".

1037

#

1038

# If not specified, the default value is "http://www.googleapis.com/"

1039

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1040

# "dataflow/v1b3/projects".

1041

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1042

# storage.

1043

#

1044

# The supported resource type is:

1045

#

1046

# Google Cloud Storage:

1047

#

1048

# storage.googleapis.com/{bucket}/{object}

1049

# bucket.storage.googleapis.com/{object}

1050

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1051

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

1052

"harnessCommand": "A String", # The command to launch the worker harness.

1053

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1054

# temporary storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1055

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

1056

# The supported resource type is:

1057

#

1058

# Google Cloud Storage:

1059

# storage.googleapis.com/{bucket}/{object}

1060

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1061

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1062

#

1063

# When workers access Google Cloud APIs, they logically do so via

1064

# relative URLs. If this field is specified, it supplies the base

1065

# URL to use for resolving these relative URLs. The normative

1066

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1067

# Locators".

1068

#

1069

# If not specified, the default value is "http://www.googleapis.com/"

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1070

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1071

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1072

# service will choose a number of threads (according to the number of cores

1073

# on the selected machine type for batch, or 1 by convention for streaming).

1074

"poolArgs": { # Extra arguments for this worker pool.

1075

"a_key": "", # Properties of the object. Contains field @type with type URL.

1076

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1077

"packages": [ # Packages to be installed on workers.

1078

{ # The packages that must be installed in order for a worker to run the

1079

# steps of the Cloud Dataflow job that will be assigned to its worker

1080

# pool.

1081

#

1082

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1083

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1084

# might use this to install jars containing the user's code and all of the

1085

# various dependencies (libraries, data files, etc.) required in order

1086

# for that code to run.

1087

"location": "A String", # The resource to read the package from. The supported resource type is:

1088

#

1089

# Google Cloud Storage:

1090

#

1091

# storage.googleapis.com/{bucket}

1092

# bucket.storage.googleapis.com/

1093

"name": "A String", # The name of the package.

1094

},

1095

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1096

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1097

# select a default set of packages which are useful to worker

1098

# harnesses written in a particular language.

1099

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1100

# are supported.

1101

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1102

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1103

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1104

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1105

# `TEARDOWN_NEVER`.

1106

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1107

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1108

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1109

# down.

1110

#

1111

# If the workers are not torn down by the service, they will

1112

# continue to run and use Google Compute Engine VM resources in the

1113

# user's project until they are explicitly terminated by the user.

1114

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1115

# policy except for small, manually supervised test jobs.

1116

#

1117

# If unknown or unspecified, the service will attempt to choose a reasonable

1118

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1119

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1120

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1121

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1122

# execute the job. If zero or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1123

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1124

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1125

# the form "regions/REGION/subnetworks/SUBNETWORK".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1126

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1127

{ # Describes the data disk used by a workflow job.

1128

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1129

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1130

# attempt to choose a reasonable default.

1131

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1132

# must be a disk type appropriate to the project and zone in which

1133

# the workers will run. If unknown or unspecified, the service

1134

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1135

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1136

# For example, the standard persistent disk type is a resource name

1137

# typically ending in "pd-standard". If SSD persistent disks are

1138

# available, the resource name typically ends with "pd-ssd". The

1139

# actual valid values are defined the Google Compute Engine API,

1140

# not by the Cloud Dataflow API; consult the Google Compute Engine

1141

# documentation for more information about determining the set of

1142

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1143

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1144

# Google Compute Engine Disk types are local to a particular

1145

# project in a particular zone, and so the resource name will

1146

# typically look something like this:

1147

#

1148

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1149

},

1150

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1151

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1152

# only be set in the Fn API path. For non-cross-language pipelines this

1153

# should have only one entry. Cross-language pipelines will have two or more

1154

# entries.

1155

{ # Defines a SDK harness container for executing Dataflow pipelines.

1156

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1157

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1158

# container instance with this image. If false (or unset) recommends using

1159

# more than one core per SDK container instance with this image for

1160

# efficiency. Note that Dataflow service may choose to override this property

1161

# if needed.

1162

},

1163

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1164

},

1165

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1166

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1167

# unspecified, the service will attempt to choose a reasonable

1168

# default. This should be in the form of the API service name,

1169

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1170

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1171

# storage. The system will append the suffix "/temp-{JOBNAME} to

1172

# this resource prefix, where {JOBNAME} is the value of the

1173

# job_name field. The resulting bucket and object prefix is used

1174

# as the prefix of the resources used to store temporary data

1175

# needed during the job execution. NOTE: This will override the

1176

# value in taskrunner_settings.

1177

# The supported resource type is:

1178

#

1179

# Google Cloud Storage:

1180

#

1181

# storage.googleapis.com/{bucket}/{object}

1182

# bucket.storage.googleapis.com/{object}

1183

},

1184

"location": "A String", # The [regional endpoint]

1185

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1186

# contains this job.

1187

"tempFiles": [ # A set of files the system should be aware of that are used

1188

# for temporary storage. These temporary files will be

1189

# removed on job completion.

1190

# No duplicates are allowed.

1191

# No file patterns are supported.

1192

#

1193

# The supported files are:

1194

#

1195

# Google Cloud Storage:

1196

#

1197

# storage.googleapis.com/{bucket}/{object}

1198

# bucket.storage.googleapis.com/{object}

1199

"A String",

1200

],

1201

"type": "A String", # The type of Cloud Dataflow job.

1202

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1203

# If this field is set, the service will ensure its uniqueness.

1204

# The request to create a job will fail if the service has knowledge of a

1205

# previously submitted job with the same client's ID and job name.

1206

# The caller may use this field to ensure idempotence of job

1207

# creation across retried attempts to create a job.

1208

# By default, the field is empty and, in that case, the service ignores it.

1209

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1210

# snapshot.

1211

"stepsLocation": "A String", # The GCS location where the steps are stored.

1212

"currentStateTime": "A String", # The timestamp associated with the current state.

1213

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1214

# Flexible resource scheduling jobs are started with some delay after job

1215

# creation, so start_time is unset before start and is updated when the

1216

# job is started by the Cloud Dataflow service. For other jobs, start_time

1217

# always equals to create_time and is immutable and set by the Cloud Dataflow

1218

# service.

1219

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1220

# Cloud Dataflow service.

1221

"requestedState": "A String", # The job's requested state.

1222

#

1223

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1224

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1225

# also be used to directly set a job's requested state to

1226

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1227

# job if it has not already reached a terminal state.

1228

"name": "A String", # The user-specified Cloud Dataflow job name.

1229

#

1230

# Only one Job with a given name may exist in a project at any

1231

# given time. If a caller attempts to create a Job with the same

1232

# name as an already-existing Job, the attempt returns the

1233

# existing Job.

1234

#

1235

# The name must match the regular expression

1236

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1237

"steps": [ # Exactly one of step or steps_location should be specified.

1238

#

1239

# The top-level steps that constitute the entire job.

1240

{ # Defines a particular step within a Cloud Dataflow job.

1241

#

1242

# A job consists of multiple steps, each of which performs some

1243

# specific operation as part of the overall job. Data is typically

1244

# passed from one step to another as part of the job.

1245

#

1246

# Here's an example of a sequence of steps which together implement a

1247

# Map-Reduce job:

1248

#

1249

# * Read a collection of data from some source, parsing the

1250

# collection's elements.

1251

#

1252

# * Validate the elements.

1253

#

1254

# * Apply a user-defined function to map each element to some value

1255

# and extract an element-specific key value.

1256

#

1257

# * Group elements with the same key into a single element with

1258

# that key, transforming a multiply-keyed collection into a

1259

# uniquely-keyed collection.

1260

#

1261

# * Write the elements out to some data sink.

1262

#

1263

# Note that the Cloud Dataflow service may be used to run many different

1264

# types of jobs, not just Map-Reduce.

1265

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1266

"name": "A String", # The name that identifies the step. This must be unique for each

1267

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1268

"properties": { # Named properties associated with the step. Each kind of

1269

# predefined step has its own required set of properties.

1270

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1271

"a_key": "", # Properties of the object.

1272

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1273

},

1274

],

1275

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1276

# of the job it replaced.

1277

#

1278

# When sending a `CreateJobRequest`, you can update a job by specifying it

1279

# here. The job named here is stopped, and its intermediate state is

1280

# transferred to this job.

1281

"currentState": "A String", # The current state of the job.

1282

#

1283

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1284

# specified.

1285

#

1286

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1287

# terminal state. After a job has reached a terminal state, no

1288

# further state updates may be made.

1289

#

1290

# This field may be mutated by the Cloud Dataflow service;

1291

# callers cannot mutate it.

1292

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1293

# isn't contained in the submitted job.

1294

"stages": { # A mapping from each stage to the information about that stage.

1295

"a_key": { # Contains information about how a particular

1296

# google.dataflow.v1beta3.Step will be executed.

1297

"stepName": [ # The steps associated with the execution stage.

1298

# Note that stages may have several steps, and that a given step

1299

# might be run by more than one stage.

"A String",

],

},

},

},

}</pre>

</div>

<code class="details" id="get">get(projectId, location, jobId, x__xgafv=None, view=None)</code>

1310

<pre>Gets the state of the specified Cloud Dataflow job.

1311

1312

To get the state of a job, we recommend using `projects.locations.jobs.get`

1313

with a [regional endpoint]

1314

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1315

`projects.jobs.get` is not recommended, as you can only get the state of

1316

jobs that are running in `us-central1`.

1317

1318

Args:

1319

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1320

location: string, The [regional endpoint]

1321

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1322

contains this job. (required)

1323

jobId: string, The job ID. (required)

1324

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

view: string, The level of information requested in response.

1329

1330

Returns:

1331

An object of the form:

1332

1333

{ # Defines a job to be run by the Cloud Dataflow service.

1334

"labels": { # User-defined labels for this job.

1335

#

1336

# The labels map can contain no more than 64 entries. Entries of the labels

1337

# map are UTF8 strings that comply with the following restrictions:

1338

#

1339

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1340

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1341

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1346

# by the metadata values provided here. Populated for ListJobs and all GetJob

1347

# views SUMMARY and higher.

1348

# ListJob response and Job SUMMARY view.

1349

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1350

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1351

"version": "A String", # The version of the SDK used to run the job.

1352

"sdkSupportStatus": "A String", # The support status for this SDK version.

1353

},

1354

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1355

{ # Metadata for a PubSub connector used by the job.

1356

"topic": "A String", # Topic accessed in the connection.

1357

"subscription": "A String", # Subscription used in the connection.

1358

},

1359

],

1360

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1361

{ # Metadata for a Datastore connector used by the job.

1362

"projectId": "A String", # ProjectId accessed in the connection.

1363

"namespace": "A String", # Namespace used in the connection.

1364

},

1365

],

1366

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1367

{ # Metadata for a File connector used by the job.

1368

"filePattern": "A String", # File Pattern used to access files by the connector.

1369

},

1370

],

1371

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1372

{ # Metadata for a Spanner connector used by the job.

1373

"instanceId": "A String", # InstanceId accessed in the connection.

1374

"projectId": "A String", # ProjectId accessed in the connection.

1375

"databaseId": "A String", # DatabaseId accessed in the connection.

1376

},

1377

],

1378

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1379

{ # Metadata for a BigTable connector used by the job.

1380

"instanceId": "A String", # InstanceId accessed in the connection.

1381

"projectId": "A String", # ProjectId accessed in the connection.

1382

"tableId": "A String", # TableId accessed in the connection.

1383

},

1384

],

1385

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1386

{ # Metadata for a BigQuery connector used by the job.

1387

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1388

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1389

"table": "A String", # Table accessed in the connection.

1390

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1391

},

1392

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1393

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1394

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1395

# A description of the user pipeline and stages through which it is executed.

1396

# Created by Cloud Dataflow service. Only retrieved with

1397

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1398

# form. This data is provided by the Dataflow service for ease of visualizing

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1399

# the pipeline and interpreting Dataflow provided metrics.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1400

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1401

{ # Description of the type, names/ids, and input/outputs for a transform.

1402

"kind": "A String", # Type of transform.

1403

"name": "A String", # User provided name for this transform instance.

1404

"inputCollectionName": [ # User names for all collection inputs to this transform.

1405

"A String",

1406

],

1407

"displayData": [ # Transform-specific display data.

1408

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1409

"key": "A String", # The key identifying the display data.

1410

# This is intended to be used as a label for the display data

1411

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1412

"shortStrValue": "A String", # A possible additional shorter value to display.

1413

# For example a java_class_name_value of com.mypackage.MyDoFn

1414

# will be stored with MyDoFn as the short_str_value and

1415

# com.mypackage.MyDoFn as the java_class_name value.

1416

# short_str_value can be displayed and java_class_name_value

1417

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1418

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1419

"url": "A String", # An optional full URL.

1420

"floatValue": 3.14, # Contains value if the data is of float type.

1421

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1422

# language namespace (i.e. python module) which defines the display data.

1423

# This allows a dax monitoring system to specially handle the data

1424

# and perform custom rendering.

1425

"javaClassValue": "A String", # Contains value if the data is of java class type.

1426

"label": "A String", # An optional label to display in a dax UI for the element.

1427

"boolValue": True or False, # Contains value if the data is of a boolean type.

1428

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1429

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1430

"int64Value": "A String", # Contains value if the data is of int64 type.

1431

},

1432

],

1433

"outputCollectionName": [ # User names for all collection outputs to this transform.

1434

"A String",

1435

],

1436

"id": "A String", # SDK generated id of this transform instance.

1437

},

1438

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1439

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1440

{ # Description of the composing transforms, names/ids, and input/outputs of a

1441

# stage of execution. Some composing transforms and sources may have been

1442

# generated by the Dataflow service during execution planning.

1443

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1444

{ # Description of an interstitial value between transforms in an execution

1445

# stage.

1446

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1447

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1448

# source is most closely associated.

1449

"name": "A String", # Dataflow service generated name for this source.

1450

},

1451

],

1452

"kind": "A String", # Type of tranform this stage is executing.

1453

"name": "A String", # Dataflow service generated name for this stage.

1454

"outputSource": [ # Output sources for this stage.

1455

{ # Description of an input or output of an execution stage.

1456

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1457

"sizeBytes": "A String", # Size of the source, if measurable.

1458

"name": "A String", # Dataflow service generated name for this source.

1459

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1460

# source is most closely associated.

1461

},

1462

],

1463

"inputSource": [ # Input sources for this stage.

1464

{ # Description of an input or output of an execution stage.

1465

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1466

"sizeBytes": "A String", # Size of the source, if measurable.

1467

"name": "A String", # Dataflow service generated name for this source.

1468

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1469

# source is most closely associated.

1470

},

1471

],

1472

"componentTransform": [ # Transforms that comprise this execution stage.

1473

{ # Description of a transform executed as part of an execution stage.

1474

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1475

"originalTransform": "A String", # User name for the original user transform with which this transform is

1476

# most closely associated.

1477

"name": "A String", # Dataflow service generated name for this source.

1478

},

1479

],

1480

"id": "A String", # Dataflow service generated id for this stage.

1481

},

1482

],

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1483

"displayData": [ # Pipeline level display data.

1484

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1485

"key": "A String", # The key identifying the display data.

1486

# This is intended to be used as a label for the display data

1487

# when viewed in a dax monitoring system.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1488

"shortStrValue": "A String", # A possible additional shorter value to display.

1489

# For example a java_class_name_value of com.mypackage.MyDoFn

1490

# will be stored with MyDoFn as the short_str_value and

1491

# com.mypackage.MyDoFn as the java_class_name value.

1492

# short_str_value can be displayed and java_class_name_value

1493

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1494

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1495

"url": "A String", # An optional full URL.

1496

"floatValue": 3.14, # Contains value if the data is of float type.

1497

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1498

# language namespace (i.e. python module) which defines the display data.

1499

# This allows a dax monitoring system to specially handle the data

1500

# and perform custom rendering.

1501

"javaClassValue": "A String", # Contains value if the data is of java class type.

1502

"label": "A String", # An optional label to display in a dax UI for the element.

1503

"boolValue": True or False, # Contains value if the data is of a boolean type.

1504

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1505

"durationValue": "A String", # Contains value if the data is of duration type.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1506

"int64Value": "A String", # Contains value if the data is of int64 type.

1507

},

1508

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1509

},

1510

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1511

# callers cannot mutate it.

1512

{ # A message describing the state of a particular execution stage.

1513

"executionStageName": "A String", # The name of the execution stage.

1514

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1515

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1516

},

1517

],

1518

"id": "A String", # The unique ID of this job.

1519

#

1520

# This field is set by the Cloud Dataflow service when the Job is

1521

# created, and is immutable for the life of the job.

1522

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1523

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1524

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1525

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

1526

# corresponding name prefixes of the new job.

1527

"a_key": "A String",

1528

},

1529

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1530

"workerRegion": "A String", # The Compute Engine region

1531

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1532

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1533

# with worker_zone. If neither worker_region nor worker_zone is specified,

1534

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1535

"version": { # A structure describing which components and their versions of the service

1536

# are required in order to run the job.

1537

"a_key": "", # Properties of the object.

1538

},

1539

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1540

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1541

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1542

#

1543

# Format:

1544

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1545

"internalExperiments": { # Experimental settings.

1546

"a_key": "", # Properties of the object. Contains field @type with type URL.

1547

},

1548

"dataset": "A String", # The dataset for the current project where various workflow

1549

# related tables are stored.

1550

#

1551

# The supported resource type is:

1552

#

1553

# Google BigQuery:

1554

# bigquery.googleapis.com/{dataset}

1555

"experiments": [ # The list of experiments to enable.

1556

"A String",

1557

],

1558

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

1559

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1560

# options are passed through the service and are used to recreate the

1561

# SDK pipeline options on the worker in a language agnostic and platform

1562

# independent way.

1563

"a_key": "", # Properties of the object.

1564

},

1565

"userAgent": { # A description of the process that generated the request.

1566

"a_key": "", # Properties of the object.

1567

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1568

"workerZone": "A String", # The Compute Engine zone

1569

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1570

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1571

# with worker_region. If neither worker_region nor worker_zone is specified,

1572

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1573

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1574

# specified in order for the job to have workers.

1575

{ # Describes one particular pool of Cloud Dataflow workers to be

1576

# instantiated by the Cloud Dataflow service in order to perform the

1577

# computations required by a job. Note that a workflow job may use

1578

# multiple pools, in order to match the various computational

1579

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1580

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1581

# harness, residing in Google Container Registry.

1582

#

1583

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1584

"ipConfiguration": "A String", # Configuration for VM IPs.

1585

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1586

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1587

"algorithm": "A String", # The algorithm to use for autoscaling.

1588

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1589

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1590

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1591

# the service will use the network "default".

1592

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1593

# will attempt to choose a reasonable default.

1594

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1595

"a_key": "A String",

1596

},

1597

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1598

# service will attempt to choose a reasonable default.

1599

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1600

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1601

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1602

# using the standard Dataflow task runner. Users should ignore

1603

# this field.

1604

"workflowFileName": "A String", # The file to store the workflow in.

1605

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1606

# will not be uploaded.

1607

#

1608

# The supported resource type is:

1609

#

1610

# Google Cloud Storage:

1611

# storage.googleapis.com/{bucket}/{object}

1612

# bucket.storage.googleapis.com/{object}

1613

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1614

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1615

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1616

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1617

"vmId": "A String", # The ID string of the VM.

1618

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1619

# taskrunner; e.g. "wheel".

1620

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1621

# taskrunner; e.g. "root".

1622

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1623

# access the Cloud Dataflow API.

1624

"A String",

1625

],

1626

"languageHint": "A String", # The suggested backend language.

1627

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1628

# console.

1629

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1630

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1631

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1632

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1633

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1634

# "shuffle/v1beta1".

1635

"workerId": "A String", # The ID of the worker running this pipeline.

1636

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1637

#

1638

# When workers access Google Cloud APIs, they logically do so via

1639

# relative URLs. If this field is specified, it supplies the base

1640

# URL to use for resolving these relative URLs. The normative

1641

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1642

# Locators".

1643

#

1644

# If not specified, the default value is "http://www.googleapis.com/"

1645

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1646

# "dataflow/v1b3/projects".

1647

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1648

# storage.

1649

#

1650

# The supported resource type is:

1651

#

1652

# Google Cloud Storage:

1653

#

1654

# storage.googleapis.com/{bucket}/{object}

1655

# bucket.storage.googleapis.com/{object}

1656

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1657

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1658

"harnessCommand": "A String", # The command to launch the worker harness.

1659

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1660

# temporary storage.

1661

#

1662

# The supported resource type is:

1663

#

1664

# Google Cloud Storage:

1665

# storage.googleapis.com/{bucket}/{object}

1666

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1667

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1668

#

1669

# When workers access Google Cloud APIs, they logically do so via

1670

# relative URLs. If this field is specified, it supplies the base

1671

# URL to use for resolving these relative URLs. The normative

1672

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1673

# Locators".

1674

#

1675

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1676

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1677

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1678

# service will choose a number of threads (according to the number of cores

1679

# on the selected machine type for batch, or 1 by convention for streaming).

1680

"poolArgs": { # Extra arguments for this worker pool.

1681

"a_key": "", # Properties of the object. Contains field @type with type URL.

1682

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1683

"packages": [ # Packages to be installed on workers.

1684

{ # The packages that must be installed in order for a worker to run the

1685

# steps of the Cloud Dataflow job that will be assigned to its worker

1686

# pool.

1687

#

1688

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1689

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1690

# might use this to install jars containing the user's code and all of the

1691

# various dependencies (libraries, data files, etc.) required in order

1692

# for that code to run.

1693

"location": "A String", # The resource to read the package from. The supported resource type is:

1694

#

1695

# Google Cloud Storage:

1696

#

1697

# storage.googleapis.com/{bucket}

1698

# bucket.storage.googleapis.com/

1699

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1700

},

1701

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1702

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1703

# select a default set of packages which are useful to worker

1704

# harnesses written in a particular language.

1705

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1706

# are supported.

1707

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1708

# attempt to choose a reasonable default.

1709

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1710

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1711

# `TEARDOWN_NEVER`.

1712

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1713

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1714

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1715

# down.

1716

#

1717

# If the workers are not torn down by the service, they will

1718

# continue to run and use Google Compute Engine VM resources in the

1719

# user's project until they are explicitly terminated by the user.

1720

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1721

# policy except for small, manually supervised test jobs.

1722

#

1723

# If unknown or unspecified, the service will attempt to choose a reasonable

1724

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1725

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1726

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1727

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1728

# execute the job. If zero or unspecified, the service will

1729

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1730

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1731

# the form "regions/REGION/subnetworks/SUBNETWORK".

1732

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1733

{ # Describes the data disk used by a workflow job.

1734

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1735

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1736

# attempt to choose a reasonable default.

1737

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1738

# must be a disk type appropriate to the project and zone in which

1739

# the workers will run. If unknown or unspecified, the service

1740

# will attempt to choose a reasonable default.

1741

#

1742

# For example, the standard persistent disk type is a resource name

1743

# typically ending in "pd-standard". If SSD persistent disks are

1744

# available, the resource name typically ends with "pd-ssd". The

1745

# actual valid values are defined the Google Compute Engine API,

1746

# not by the Cloud Dataflow API; consult the Google Compute Engine

1747

# documentation for more information about determining the set of

1748

# available disk types for a particular project and zone.

1749

#

1750

# Google Compute Engine Disk types are local to a particular

1751

# project in a particular zone, and so the resource name will

1752

# typically look something like this:

1753

#

1754

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1755

},

1756

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1757

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1758

# only be set in the Fn API path. For non-cross-language pipelines this

1759

# should have only one entry. Cross-language pipelines will have two or more

1760

# entries.

1761

{ # Defines a SDK harness container for executing Dataflow pipelines.

1762

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1763

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1764

# container instance with this image. If false (or unset) recommends using

1765

# more than one core per SDK container instance with this image for

1766

# efficiency. Note that Dataflow service may choose to override this property

1767

# if needed.

1768

},

1769

],

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1770

},

1771

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1772

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1773

# unspecified, the service will attempt to choose a reasonable

1774

# default. This should be in the form of the API service name,

1775

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1776

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1777

# storage. The system will append the suffix "/temp-{JOBNAME} to

1778

# this resource prefix, where {JOBNAME} is the value of the

1779

# job_name field. The resulting bucket and object prefix is used

1780

# as the prefix of the resources used to store temporary data

1781

# needed during the job execution. NOTE: This will override the

1782

# value in taskrunner_settings.

1783

# The supported resource type is:

1784

#

1785

# Google Cloud Storage:

1786

#

1787

# storage.googleapis.com/{bucket}/{object}

1788

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1789

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1790

"location": "A String", # The [regional endpoint]

1791

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1792

# contains this job.

1793

"tempFiles": [ # A set of files the system should be aware of that are used

1794

# for temporary storage. These temporary files will be

1795

# removed on job completion.

1796

# No duplicates are allowed.

1797

# No file patterns are supported.

1798

#

1799

# The supported files are:

1800

#

1801

# Google Cloud Storage:

1802

#

1803

# storage.googleapis.com/{bucket}/{object}

1804

# bucket.storage.googleapis.com/{object}

1805

"A String",

1806

],

1807

"type": "A String", # The type of Cloud Dataflow job.

1808

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1809

# If this field is set, the service will ensure its uniqueness.

1810

# The request to create a job will fail if the service has knowledge of a

1811

# previously submitted job with the same client's ID and job name.

1812

# The caller may use this field to ensure idempotence of job

1813

# creation across retried attempts to create a job.

1814

# By default, the field is empty and, in that case, the service ignores it.

1815

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1816

# snapshot.

1817

"stepsLocation": "A String", # The GCS location where the steps are stored.

1818

"currentStateTime": "A String", # The timestamp associated with the current state.

1819

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1820

# Flexible resource scheduling jobs are started with some delay after job

1821

# creation, so start_time is unset before start and is updated when the

1822

# job is started by the Cloud Dataflow service. For other jobs, start_time

1823

# always equals to create_time and is immutable and set by the Cloud Dataflow

1824

# service.

1825

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1826

# Cloud Dataflow service.

1827

"requestedState": "A String", # The job's requested state.

1828

#

1829

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1830

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1831

# also be used to directly set a job's requested state to

1832

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1833

# job if it has not already reached a terminal state.

1834

"name": "A String", # The user-specified Cloud Dataflow job name.

1835

#

1836

# Only one Job with a given name may exist in a project at any

1837

# given time. If a caller attempts to create a Job with the same

1838

# name as an already-existing Job, the attempt returns the

1839

# existing Job.

1840

#

1841

# The name must match the regular expression

1842

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1843

"steps": [ # Exactly one of step or steps_location should be specified.

1844

#

1845

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1846

{ # Defines a particular step within a Cloud Dataflow job.

1847

#

1848

# A job consists of multiple steps, each of which performs some

1849

# specific operation as part of the overall job. Data is typically

1850

# passed from one step to another as part of the job.

1851

#

1852

# Here's an example of a sequence of steps which together implement a

1853

# Map-Reduce job:

1854

#

1855

# * Read a collection of data from some source, parsing the

1856

# collection's elements.

1857

#

1858

# * Validate the elements.

1859

#

1860

# * Apply a user-defined function to map each element to some value

1861

# and extract an element-specific key value.

1862

#

1863

# * Group elements with the same key into a single element with

1864

# that key, transforming a multiply-keyed collection into a

1865

# uniquely-keyed collection.

1866

#

1867

# * Write the elements out to some data sink.

1868

#

1869

# Note that the Cloud Dataflow service may be used to run many different

1870

# types of jobs, not just Map-Reduce.

1871

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1872

"name": "A String", # The name that identifies the step. This must be unique for each

1873

# step with respect to all other steps in the Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1874

"properties": { # Named properties associated with the step. Each kind of

1875

# predefined step has its own required set of properties.

1876

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1877

"a_key": "", # Properties of the object.

1878

},

1879

},

1880

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

1881

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1882

# of the job it replaced.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1883

#

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

1884

# When sending a `CreateJobRequest`, you can update a job by specifying it

1885

# here. The job named here is stopped, and its intermediate state is

1886

# transferred to this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1887

"currentState": "A String", # The current state of the job.

1888

#

1889

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1890

# specified.

1891

#

1892

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1893

# terminal state. After a job has reached a terminal state, no

1894

# further state updates may be made.

1895

#

1896

# This field may be mutated by the Cloud Dataflow service;

1897

# callers cannot mutate it.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1898

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1899

# isn't contained in the submitted job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1900

"stages": { # A mapping from each stage to the information about that stage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1901

"a_key": { # Contains information about how a particular

1902

# google.dataflow.v1beta3.Step will be executed.

1903

"stepName": [ # The steps associated with the execution stage.

1904

# Note that stages may have several steps, and that a given step

1905

# might be run by more than one stage.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

"A String",

],

},

},

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1911

}</pre>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

</div>

<code class="details" id="getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</code>

1916

<pre>Request the job status.

1917

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1918

To request the status of a job, we recommend using

1919

`projects.locations.jobs.getMetrics` with a [regional endpoint]

1920

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1921

`projects.jobs.getMetrics` is not recommended, as you can only request the

1922

status of jobs that are running in `us-central1`.

1923

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1924

Args:

1925

projectId: string, A project id. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1926

location: string, The [regional endpoint]

1927

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1928

contains the job specified by job_id. (required)

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1929

jobId: string, The job to get messages for. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1930

startTime: string, Return only metric data that has changed since this time.

1931

Default is to return all information about all metrics for the job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1932

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1933

Allowed values

1934

1 - v1 error format

1935

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1936

1937

Returns:

1938

An object of the form:

1939

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1940

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1941

# of a Dataflow job. Metrics correspond to user-defined and system-defined

1942

# metrics in the job.

1943

#

1944

# This resource captures only the most recent values of each metric;

1945

# time-series data can be queried for them (under the same metric names)

1946

# from Cloud Monitoring.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1947

"metrics": [ # All metrics for this job.

1948

{ # Describes the state of a metric.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1949

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

1950

# This holds the count of the aggregated values and is used in combination

1951

# with mean_sum above to obtain the actual mean aggregate value.

1952

# The only possible value type is Long.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1953

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

1954

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

1955

# The specified aggregation kind is case-insensitive.

1956

#

1957

# If omitted, this is not an aggregated value but instead

1958

# a single metric sample value.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1959

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

1960

# possible value type is a list of Values whose type can be Long, Double,

1961

# or String, according to the metric's type. All Values in the list must

1962

# be of the same type.

1963

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

1964

# metric.

1965

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

1966

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1967

"name": "A String", # Worker-defined metric name.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1968

"context": { # Zero or more labeled fields which identify the part of the job this

1969

# metric is associated with, such as the name of a step or collection.

1970

#

1971

# For example, built-in counters associated with steps will have

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1972

# context['step'] = <step-name>. Counters associated with PCollections

1973

# in the SDK will have context['pcollection'] = <pcollection-name>.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1974

"a_key": "A String",

1975

},

1976

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1977

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

1978

# This holds the sum of the aggregated values and is used in combination

1979

# with mean_count below to obtain the actual mean aggregate value.

1980

# The only possible value types are Long and Double.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1981

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

1982

# value accumulated since the worker started working on this WorkItem.

1983

# By default this is false, indicating that this metric is reported

1984

# as a delta that is not associated with any WorkItem.

1985

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

1986

# reporting work progress; it will be filled in responses from the

1987

# metrics API.

1988

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

1989

# "And", and "Or". The possible value types are Long, Double, and Boolean.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1990

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

1991

# service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1992

"gauge": "", # A struct value describing properties of a Gauge.

1993

# Metrics of gauge type show the value of a metric across time, and is

1994

# aggregated based on the newest value.

1995

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1996

},

1997

],

1998

"metricTime": "A String", # Timestamp as of which metric values are current.

}</pre>

</div>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2003

<code class="details" id="list">list(projectId, location, pageSize=None, pageToken=None, x__xgafv=None, filter=None, view=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2004

<pre>List the jobs of a project.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2005

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2006

To list the jobs of a project in a region, we recommend using

2007

`projects.locations.jobs.get` with a [regional endpoint]

2008

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2009

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2010

`projects.jobs.list` is not recommended, as you can only get the list of

2011

jobs that are running in `us-central1`.

2012

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2013

Args:

2014

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2015

location: string, The [regional endpoint]

2016

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2017

contains this job. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2018

pageSize: integer, If there are many jobs, limit response to at most this many.

2019

The actual number of jobs returned will be the lesser of max_responses

2020

and an unspecified server-defined limit.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2021

pageToken: string, Set this to the 'next_page_token' field of a previous response

2022

to request additional results in a long list.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2023

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2024

Allowed values

2025

1 - v1 error format

2026

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2027

filter: string, The kind of filter to use.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2028

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2029

2030

Returns:

2031

An object of the form:

2032

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2033

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2034

# be a partial response, depending on the page size in the ListJobsRequest.

2035

# However, if the project does not have any jobs, an instance of

2036

# ListJobsResponse is not returned and the requests's response

2037

# body is empty {}.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2038

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2039

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

2040

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2041

# failed to respond.

2042

{ # Indicates which [regional endpoint]

2043

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

2044

# to respond to a request for data.

2045

"name": "A String", # The name of the [regional endpoint]

2046

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2047

# failed to respond.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2048

},

2049

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2050

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2051

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2052

"labels": { # User-defined labels for this job.

2053

#

2054

# The labels map can contain no more than 64 entries. Entries of the labels

2055

# map are UTF8 strings that comply with the following restrictions:

2056

#

2057

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2058

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2059

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2064

# by the metadata values provided here. Populated for ListJobs and all GetJob

2065

# views SUMMARY and higher.

2066

# ListJob response and Job SUMMARY view.

2067

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

2068

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

2069

"version": "A String", # The version of the SDK used to run the job.

2070

"sdkSupportStatus": "A String", # The support status for this SDK version.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2071

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2072

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2073

{ # Metadata for a PubSub connector used by the job.

2074

"topic": "A String", # Topic accessed in the connection.

2075

"subscription": "A String", # Subscription used in the connection.

2076

},

2077

],

2078

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2079

{ # Metadata for a Datastore connector used by the job.

2080

"projectId": "A String", # ProjectId accessed in the connection.

2081

"namespace": "A String", # Namespace used in the connection.

2082

},

2083

],

2084

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2085

{ # Metadata for a File connector used by the job.

2086

"filePattern": "A String", # File Pattern used to access files by the connector.

2087

},

2088

],

2089

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2090

{ # Metadata for a Spanner connector used by the job.

2091

"instanceId": "A String", # InstanceId accessed in the connection.

2092

"projectId": "A String", # ProjectId accessed in the connection.

2093

"databaseId": "A String", # DatabaseId accessed in the connection.

2094

},

2095

],

2096

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2097

{ # Metadata for a BigTable connector used by the job.

2098

"instanceId": "A String", # InstanceId accessed in the connection.

2099

"projectId": "A String", # ProjectId accessed in the connection.

2100

"tableId": "A String", # TableId accessed in the connection.

2101

},

2102

],

2103

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2104

{ # Metadata for a BigQuery connector used by the job.

2105

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2106

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2107

"table": "A String", # Table accessed in the connection.

2108

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2113

# A description of the user pipeline and stages through which it is executed.

2114

# Created by Cloud Dataflow service. Only retrieved with

2115

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2116

# form. This data is provided by the Dataflow service for ease of visualizing

2117

# the pipeline and interpreting Dataflow provided metrics.

2118

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2119

{ # Description of the type, names/ids, and input/outputs for a transform.

2120

"kind": "A String", # Type of transform.

2121

"name": "A String", # User provided name for this transform instance.

2122

"inputCollectionName": [ # User names for all collection inputs to this transform.

2123

"A String",

2124

],

2125

"displayData": [ # Transform-specific display data.

2126

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2127

"key": "A String", # The key identifying the display data.

2128

# This is intended to be used as a label for the display data

2129

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2130

"shortStrValue": "A String", # A possible additional shorter value to display.

2131

# For example a java_class_name_value of com.mypackage.MyDoFn

2132

# will be stored with MyDoFn as the short_str_value and

2133

# com.mypackage.MyDoFn as the java_class_name value.

2134

# short_str_value can be displayed and java_class_name_value

2135

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2136

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2137

"url": "A String", # An optional full URL.

2138

"floatValue": 3.14, # Contains value if the data is of float type.

2139

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2140

# language namespace (i.e. python module) which defines the display data.

2141

# This allows a dax monitoring system to specially handle the data

2142

# and perform custom rendering.

2143

"javaClassValue": "A String", # Contains value if the data is of java class type.

2144

"label": "A String", # An optional label to display in a dax UI for the element.

2145

"boolValue": True or False, # Contains value if the data is of a boolean type.

2146

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2147

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2148

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2149

},

2150

],

2151

"outputCollectionName": [ # User names for all collection outputs to this transform.

2152

"A String",

2153

],

2154

"id": "A String", # SDK generated id of this transform instance.

2155

},

2156

],

2157

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2158

{ # Description of the composing transforms, names/ids, and input/outputs of a

2159

# stage of execution. Some composing transforms and sources may have been

2160

# generated by the Dataflow service during execution planning.

2161

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2162

{ # Description of an interstitial value between transforms in an execution

2163

# stage.

2164

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2165

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2166

# source is most closely associated.

2167

"name": "A String", # Dataflow service generated name for this source.

2168

},

2169

],

2170

"kind": "A String", # Type of tranform this stage is executing.

2171

"name": "A String", # Dataflow service generated name for this stage.

2172

"outputSource": [ # Output sources for this stage.

2173

{ # Description of an input or output of an execution stage.

2174

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2175

"sizeBytes": "A String", # Size of the source, if measurable.

2176

"name": "A String", # Dataflow service generated name for this source.

2177

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2178

# source is most closely associated.

2179

},

2180

],

2181

"inputSource": [ # Input sources for this stage.

2182

{ # Description of an input or output of an execution stage.

2183

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2184

"sizeBytes": "A String", # Size of the source, if measurable.

2185

"name": "A String", # Dataflow service generated name for this source.

2186

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2187

# source is most closely associated.

2188

},

2189

],

2190

"componentTransform": [ # Transforms that comprise this execution stage.

2191

{ # Description of a transform executed as part of an execution stage.

2192

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2193

"originalTransform": "A String", # User name for the original user transform with which this transform is

2194

# most closely associated.

2195

"name": "A String", # Dataflow service generated name for this source.

2196

},

2197

],

2198

"id": "A String", # Dataflow service generated id for this stage.

2199

},

2200

],

2201

"displayData": [ # Pipeline level display data.

2202

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2203

"key": "A String", # The key identifying the display data.

2204

# This is intended to be used as a label for the display data

2205

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2206

"shortStrValue": "A String", # A possible additional shorter value to display.

2207

# For example a java_class_name_value of com.mypackage.MyDoFn

2208

# will be stored with MyDoFn as the short_str_value and

2209

# com.mypackage.MyDoFn as the java_class_name value.

2210

# short_str_value can be displayed and java_class_name_value

2211

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2212

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2213

"url": "A String", # An optional full URL.

2214

"floatValue": 3.14, # Contains value if the data is of float type.

2215

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2216

# language namespace (i.e. python module) which defines the display data.

2217

# This allows a dax monitoring system to specially handle the data

2218

# and perform custom rendering.

2219

"javaClassValue": "A String", # Contains value if the data is of java class type.

2220

"label": "A String", # An optional label to display in a dax UI for the element.

2221

"boolValue": True or False, # Contains value if the data is of a boolean type.

2222

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2223

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2224

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2229

# callers cannot mutate it.

2230

{ # A message describing the state of a particular execution stage.

2231

"executionStageName": "A String", # The name of the execution stage.

2232

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2233

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2234

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2235

],

2236

"id": "A String", # The unique ID of this job.

2237

#

2238

# This field is set by the Cloud Dataflow service when the Job is

2239

# created, and is immutable for the life of the job.

2240

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2241

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2242

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2243

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

2244

# corresponding name prefixes of the new job.

2245

"a_key": "A String",

2246

},

2247

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2248

"workerRegion": "A String", # The Compute Engine region

2249

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2250

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2251

# with worker_zone. If neither worker_region nor worker_zone is specified,

2252

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2253

"version": { # A structure describing which components and their versions of the service

2254

# are required in order to run the job.

2255

"a_key": "", # Properties of the object.

2256

},

2257

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2258

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2259

# at rest, AKA a Customer Managed Encryption Key (CMEK).

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2260

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2261

# Format:

2262

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2263

"internalExperiments": { # Experimental settings.

2264

"a_key": "", # Properties of the object. Contains field @type with type URL.

2265

},

2266

"dataset": "A String", # The dataset for the current project where various workflow

2267

# related tables are stored.

2268

#

2269

# The supported resource type is:

2270

#

2271

# Google BigQuery:

2272

# bigquery.googleapis.com/{dataset}

2273

"experiments": [ # The list of experiments to enable.

2274

"A String",

2275

],

2276

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2277

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2278

# options are passed through the service and are used to recreate the

2279

# SDK pipeline options on the worker in a language agnostic and platform

2280

# independent way.

2281

"a_key": "", # Properties of the object.

2282

},

2283

"userAgent": { # A description of the process that generated the request.

2284

"a_key": "", # Properties of the object.

2285

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2286

"workerZone": "A String", # The Compute Engine zone

2287

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2288

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2289

# with worker_region. If neither worker_region nor worker_zone is specified,

2290

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2291

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2292

# specified in order for the job to have workers.

2293

{ # Describes one particular pool of Cloud Dataflow workers to be

2294

# instantiated by the Cloud Dataflow service in order to perform the

2295

# computations required by a job. Note that a workflow job may use

2296

# multiple pools, in order to match the various computational

2297

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2298

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2299

# harness, residing in Google Container Registry.

2300

#

2301

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2302

"ipConfiguration": "A String", # Configuration for VM IPs.

2303

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2304

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2305

"algorithm": "A String", # The algorithm to use for autoscaling.

2306

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2307

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2308

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2309

# the service will use the network "default".

2310

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2311

# will attempt to choose a reasonable default.

2312

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2313

"a_key": "A String",

2314

},

2315

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2316

# service will attempt to choose a reasonable default.

2317

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2318

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2319

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2320

# using the standard Dataflow task runner. Users should ignore

2321

# this field.

2322

"workflowFileName": "A String", # The file to store the workflow in.

2323

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2324

# will not be uploaded.

2325

#

2326

# The supported resource type is:

2327

#

2328

# Google Cloud Storage:

2329

# storage.googleapis.com/{bucket}/{object}

2330

# bucket.storage.googleapis.com/{object}

2331

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2332

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2333

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2334

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2335

"vmId": "A String", # The ID string of the VM.

2336

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2337

# taskrunner; e.g. "wheel".

2338

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2339

# taskrunner; e.g. "root".

2340

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2341

# access the Cloud Dataflow API.

2342

"A String",

2343

],

2344

"languageHint": "A String", # The suggested backend language.

2345

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2346

# console.

2347

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2348

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2349

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2350

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2351

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2352

# "shuffle/v1beta1".

2353

"workerId": "A String", # The ID of the worker running this pipeline.

2354

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2355

#

2356

# When workers access Google Cloud APIs, they logically do so via

2357

# relative URLs. If this field is specified, it supplies the base

2358

# URL to use for resolving these relative URLs. The normative

2359

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2360

# Locators".

2361

#

2362

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2363

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2364

# "dataflow/v1b3/projects".

2365

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2366

# storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2367

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

2368

# The supported resource type is:

2369

#

2370

# Google Cloud Storage:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2371

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

2372

# storage.googleapis.com/{bucket}/{object}

2373

# bucket.storage.googleapis.com/{object}

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2374

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2375

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2376

"harnessCommand": "A String", # The command to launch the worker harness.

2377

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2378

# temporary storage.

2379

#

2380

# The supported resource type is:

2381

#

2382

# Google Cloud Storage:

2383

# storage.googleapis.com/{bucket}/{object}

2384

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2385

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2386

#

2387

# When workers access Google Cloud APIs, they logically do so via

2388

# relative URLs. If this field is specified, it supplies the base

2389

# URL to use for resolving these relative URLs. The normative

2390

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2391

# Locators".

2392

#

2393

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2394

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2395

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2396

# service will choose a number of threads (according to the number of cores

2397

# on the selected machine type for batch, or 1 by convention for streaming).

2398

"poolArgs": { # Extra arguments for this worker pool.

2399

"a_key": "", # Properties of the object. Contains field @type with type URL.

2400

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2401

"packages": [ # Packages to be installed on workers.

2402

{ # The packages that must be installed in order for a worker to run the

2403

# steps of the Cloud Dataflow job that will be assigned to its worker

2404

# pool.

2405

#

2406

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2407

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2408

# might use this to install jars containing the user's code and all of the

2409

# various dependencies (libraries, data files, etc.) required in order

2410

# for that code to run.

2411

"location": "A String", # The resource to read the package from. The supported resource type is:

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2412

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2413

# Google Cloud Storage:

2414

#

2415

# storage.googleapis.com/{bucket}

2416

# bucket.storage.googleapis.com/

2417

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2418

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2419

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2420

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2421

# select a default set of packages which are useful to worker

2422

# harnesses written in a particular language.

2423

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2424

# are supported.

2425

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2426

# attempt to choose a reasonable default.

2427

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2428

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2429

# `TEARDOWN_NEVER`.

2430

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2431

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2432

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2433

# down.

2434

#

2435

# If the workers are not torn down by the service, they will

2436

# continue to run and use Google Compute Engine VM resources in the

2437

# user's project until they are explicitly terminated by the user.

2438

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2439

# policy except for small, manually supervised test jobs.

2440

#

2441

# If unknown or unspecified, the service will attempt to choose a reasonable

2442

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2443

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2444

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2445

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2446

# execute the job. If zero or unspecified, the service will

2447

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2448

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2449

# the form "regions/REGION/subnetworks/SUBNETWORK".

2450

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2451

{ # Describes the data disk used by a workflow job.

2452

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2453

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2454

# attempt to choose a reasonable default.

2455

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2456

# must be a disk type appropriate to the project and zone in which

2457

# the workers will run. If unknown or unspecified, the service

2458

# will attempt to choose a reasonable default.

2459

#

2460

# For example, the standard persistent disk type is a resource name

2461

# typically ending in "pd-standard". If SSD persistent disks are

2462

# available, the resource name typically ends with "pd-ssd". The

2463

# actual valid values are defined the Google Compute Engine API,

2464

# not by the Cloud Dataflow API; consult the Google Compute Engine

2465

# documentation for more information about determining the set of

2466

# available disk types for a particular project and zone.

2467

#

2468

# Google Compute Engine Disk types are local to a particular

2469

# project in a particular zone, and so the resource name will

2470

# typically look something like this:

2471

#

2472

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2473

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2474

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2475

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2476

# only be set in the Fn API path. For non-cross-language pipelines this

2477

# should have only one entry. Cross-language pipelines will have two or more

2478

# entries.

2479

{ # Defines a SDK harness container for executing Dataflow pipelines.

2480

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2481

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2482

# container instance with this image. If false (or unset) recommends using

2483

# more than one core per SDK container instance with this image for

2484

# efficiency. Note that Dataflow service may choose to override this property

2485

# if needed.

2486

},

2487

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2488

},

2489

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2490

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

2491

# unspecified, the service will attempt to choose a reasonable

2492

# default. This should be in the form of the API service name,

2493

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2494

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2495

# storage. The system will append the suffix "/temp-{JOBNAME} to

2496

# this resource prefix, where {JOBNAME} is the value of the

2497

# job_name field. The resulting bucket and object prefix is used

2498

# as the prefix of the resources used to store temporary data

2499

# needed during the job execution. NOTE: This will override the

2500

# value in taskrunner_settings.

2501

# The supported resource type is:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2502

#

2503

# Google Cloud Storage:

2504

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2505

# storage.googleapis.com/{bucket}/{object}

2506

# bucket.storage.googleapis.com/{object}

2507

},

2508

"location": "A String", # The [regional endpoint]

2509

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2510

# contains this job.

2511

"tempFiles": [ # A set of files the system should be aware of that are used

2512

# for temporary storage. These temporary files will be

2513

# removed on job completion.

2514

# No duplicates are allowed.

2515

# No file patterns are supported.

2516

#

2517

# The supported files are:

2518

#

2519

# Google Cloud Storage:

2520

#

2521

# storage.googleapis.com/{bucket}/{object}

2522

# bucket.storage.googleapis.com/{object}

2523

"A String",

2524

],

2525

"type": "A String", # The type of Cloud Dataflow job.

2526

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2527

# If this field is set, the service will ensure its uniqueness.

2528

# The request to create a job will fail if the service has knowledge of a

2529

# previously submitted job with the same client's ID and job name.

2530

# The caller may use this field to ensure idempotence of job

2531

# creation across retried attempts to create a job.

2532

# By default, the field is empty and, in that case, the service ignores it.

2533

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2534

# snapshot.

2535

"stepsLocation": "A String", # The GCS location where the steps are stored.

2536

"currentStateTime": "A String", # The timestamp associated with the current state.

2537

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2538

# Flexible resource scheduling jobs are started with some delay after job

2539

# creation, so start_time is unset before start and is updated when the

2540

# job is started by the Cloud Dataflow service. For other jobs, start_time

2541

# always equals to create_time and is immutable and set by the Cloud Dataflow

2542

# service.

2543

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2544

# Cloud Dataflow service.

2545

"requestedState": "A String", # The job's requested state.

2546

#

2547

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2548

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2549

# also be used to directly set a job's requested state to

2550

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2551

# job if it has not already reached a terminal state.

2552

"name": "A String", # The user-specified Cloud Dataflow job name.

2553

#

2554

# Only one Job with a given name may exist in a project at any

2555

# given time. If a caller attempts to create a Job with the same

2556

# name as an already-existing Job, the attempt returns the

2557

# existing Job.

2558

#

2559

# The name must match the regular expression

2560

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

2561

"steps": [ # Exactly one of step or steps_location should be specified.

2562

#

2563

# The top-level steps that constitute the entire job.

2564

{ # Defines a particular step within a Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2565

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2566

# A job consists of multiple steps, each of which performs some

2567

# specific operation as part of the overall job. Data is typically

2568

# passed from one step to another as part of the job.

2569

#

2570

# Here's an example of a sequence of steps which together implement a

2571

# Map-Reduce job:

2572

#

2573

# * Read a collection of data from some source, parsing the

2574

# collection's elements.

2575

#

2576

# * Validate the elements.

2577

#

2578

# * Apply a user-defined function to map each element to some value

2579

# and extract an element-specific key value.

2580

#

2581

# * Group elements with the same key into a single element with

2582

# that key, transforming a multiply-keyed collection into a

2583

# uniquely-keyed collection.

2584

#

2585

# * Write the elements out to some data sink.

2586

#

2587

# Note that the Cloud Dataflow service may be used to run many different

2588

# types of jobs, not just Map-Reduce.

2589

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2590

"name": "A String", # The name that identifies the step. This must be unique for each

2591

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2592

"properties": { # Named properties associated with the step. Each kind of

2593

# predefined step has its own required set of properties.

2594

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

2595

"a_key": "", # Properties of the object.

2596

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2597

},

2598

],

2599

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2600

# of the job it replaced.

2601

#

2602

# When sending a `CreateJobRequest`, you can update a job by specifying it

2603

# here. The job named here is stopped, and its intermediate state is

2604

# transferred to this job.

2605

"currentState": "A String", # The current state of the job.

2606

#

2607

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2608

# specified.

2609

#

2610

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2611

# terminal state. After a job has reached a terminal state, no

2612

# further state updates may be made.

2613

#

2614

# This field may be mutated by the Cloud Dataflow service;

2615

# callers cannot mutate it.

2616

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2617

# isn't contained in the submitted job.

2618

"stages": { # A mapping from each stage to the information about that stage.

2619

"a_key": { # Contains information about how a particular

2620

# google.dataflow.v1beta3.Step will be executed.

2621

"stepName": [ # The steps associated with the execution stage.

2622

# Note that stages may have several steps, and that a given step

2623

# might be run by more than one stage.

2624

"A String",

2625

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2626

},

2627

},

2628

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2629

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

],

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

2636

<pre>Retrieves the next page of results.

2637

2638

Args:

2639

previous_request: The request for the previous page. (required)

2640

previous_response: The response from the request for the previous page. (required)

2641

2642

Returns:

2643

A request object that you can call 'execute()' on to request the next

2644

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2649

<code class="details" id="snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2650

<pre>Snapshot the state of a streaming job.

2651

2652

Args:

2653

projectId: string, The project which owns the job to be snapshotted. (required)

2654

location: string, The location that contains this job. (required)

2655

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2656

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2657

The object takes the form of:

2658

2659

{ # Request to create a snapshot of a job.

2660

"location": "A String", # The location that contains this job.

2661

"ttl": "A String", # TTL for the snapshot.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2662

"description": "A String", # User specified description of the snapshot. Maybe empty.

2663

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2664

}

2665

2666

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2673

2674

{ # Represents a snapshot of a job.

2675

"sourceJobId": "A String", # The job this snapshot was created from.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2676

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

2677

# state.

2678

"description": "A String", # User specified description of the snapshot. Maybe empty.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2679

"projectId": "A String", # The project this snapshot belongs to.

2680

"creationTime": "A String", # The time this snapshot was created.

2681

"state": "A String", # State of the snapshot.

2682

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2683

"pubsubMetadata": [ # PubSub snapshot metadata.

2684

{ # Represents a Pubsub snapshot.

2685

"expireTime": "A String", # The expire time of the Pubsub snapshot.

2686

"snapshotName": "A String", # The name of the Pubsub snapshot.

2687

"topicName": "A String", # The name of the Pubsub topic.

2688

},

2689

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2690

"id": "A String", # The unique ID of this snapshot.

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2695

<code class="details" id="update">update(projectId, location, jobId, body=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2696

<pre>Updates the state of an existing Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2697

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2698

To update the state of an existing job, we recommend using

2699

`projects.locations.jobs.update` with a [regional endpoint]

2700

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2701

`projects.jobs.update` is not recommended, as you can only update the state

2702

of jobs that are running in `us-central1`.

2703

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2704

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2705

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2706

location: string, The [regional endpoint]

2707

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2708

contains this job. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2709

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2710

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2711

The object takes the form of:

2712

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2713

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2714

"labels": { # User-defined labels for this job.

2715

#

2716

# The labels map can contain no more than 64 entries. Entries of the labels

2717

# map are UTF8 strings that comply with the following restrictions:

2718

#

2719

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2720

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2721

# * Both keys and values are additionally constrained to be <= 128 bytes in

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

# size.

"a_key": "A String",

},

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2726

# by the metadata values provided here. Populated for ListJobs and all GetJob

2727

# views SUMMARY and higher.

2728

# ListJob response and Job SUMMARY view.

2729

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

2730

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

2731

"version": "A String", # The version of the SDK used to run the job.

2732

"sdkSupportStatus": "A String", # The support status for this SDK version.

2733

},

2734

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2735

{ # Metadata for a PubSub connector used by the job.

2736

"topic": "A String", # Topic accessed in the connection.

2737

"subscription": "A String", # Subscription used in the connection.

2738

},

2739

],

2740

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2741

{ # Metadata for a Datastore connector used by the job.

2742

"projectId": "A String", # ProjectId accessed in the connection.

2743

"namespace": "A String", # Namespace used in the connection.

2744

},

2745

],

2746

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2747

{ # Metadata for a File connector used by the job.

2748

"filePattern": "A String", # File Pattern used to access files by the connector.

2749

},

2750

],

2751

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2752

{ # Metadata for a Spanner connector used by the job.

2753

"instanceId": "A String", # InstanceId accessed in the connection.

2754

"projectId": "A String", # ProjectId accessed in the connection.

2755

"databaseId": "A String", # DatabaseId accessed in the connection.

2756

},

2757

],

2758

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2759

{ # Metadata for a BigTable connector used by the job.

2760

"instanceId": "A String", # InstanceId accessed in the connection.

2761

"projectId": "A String", # ProjectId accessed in the connection.

2762

"tableId": "A String", # TableId accessed in the connection.

2763

},

2764

],

2765

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2766

{ # Metadata for a BigQuery connector used by the job.

2767

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2768

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2769

"table": "A String", # Table accessed in the connection.

2770

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2775

# A description of the user pipeline and stages through which it is executed.

2776

# Created by Cloud Dataflow service. Only retrieved with

2777

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2778

# form. This data is provided by the Dataflow service for ease of visualizing

2779

# the pipeline and interpreting Dataflow provided metrics.

2780

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2781

{ # Description of the type, names/ids, and input/outputs for a transform.

2782

"kind": "A String", # Type of transform.

2783

"name": "A String", # User provided name for this transform instance.

2784

"inputCollectionName": [ # User names for all collection inputs to this transform.

2785

"A String",

2786

],

2787

"displayData": [ # Transform-specific display data.

2788

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2789

"key": "A String", # The key identifying the display data.

2790

# This is intended to be used as a label for the display data

2791

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2792

"shortStrValue": "A String", # A possible additional shorter value to display.

2793

# For example a java_class_name_value of com.mypackage.MyDoFn

2794

# will be stored with MyDoFn as the short_str_value and

2795

# com.mypackage.MyDoFn as the java_class_name value.

2796

# short_str_value can be displayed and java_class_name_value

2797

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2798

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2799

"url": "A String", # An optional full URL.

2800

"floatValue": 3.14, # Contains value if the data is of float type.

2801

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2802

# language namespace (i.e. python module) which defines the display data.

2803

# This allows a dax monitoring system to specially handle the data

2804

# and perform custom rendering.

2805

"javaClassValue": "A String", # Contains value if the data is of java class type.

2806

"label": "A String", # An optional label to display in a dax UI for the element.

2807

"boolValue": True or False, # Contains value if the data is of a boolean type.

2808

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2809

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2810

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2811

},

2812

],

2813

"outputCollectionName": [ # User names for all collection outputs to this transform.

2814

"A String",

2815

],

2816

"id": "A String", # SDK generated id of this transform instance.

2817

},

2818

],

2819

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2820

{ # Description of the composing transforms, names/ids, and input/outputs of a

2821

# stage of execution. Some composing transforms and sources may have been

2822

# generated by the Dataflow service during execution planning.

2823

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2824

{ # Description of an interstitial value between transforms in an execution

2825

# stage.

2826

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2827

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2828

# source is most closely associated.

2829

"name": "A String", # Dataflow service generated name for this source.

2830

},

2831

],

2832

"kind": "A String", # Type of tranform this stage is executing.

2833

"name": "A String", # Dataflow service generated name for this stage.

2834

"outputSource": [ # Output sources for this stage.

2835

{ # Description of an input or output of an execution stage.

2836

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2837

"sizeBytes": "A String", # Size of the source, if measurable.

2838

"name": "A String", # Dataflow service generated name for this source.

2839

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2840

# source is most closely associated.

2841

},

2842

],

2843

"inputSource": [ # Input sources for this stage.

2844

{ # Description of an input or output of an execution stage.

2845

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2846

"sizeBytes": "A String", # Size of the source, if measurable.

2847

"name": "A String", # Dataflow service generated name for this source.

2848

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2849

# source is most closely associated.

2850

},

2851

],

2852

"componentTransform": [ # Transforms that comprise this execution stage.

2853

{ # Description of a transform executed as part of an execution stage.

2854

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2855

"originalTransform": "A String", # User name for the original user transform with which this transform is

2856

# most closely associated.

2857

"name": "A String", # Dataflow service generated name for this source.

2858

},

2859

],

2860

"id": "A String", # Dataflow service generated id for this stage.

2861

},

2862

],

2863

"displayData": [ # Pipeline level display data.

2864

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2865

"key": "A String", # The key identifying the display data.

2866

# This is intended to be used as a label for the display data

2867

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2868

"shortStrValue": "A String", # A possible additional shorter value to display.

2869

# For example a java_class_name_value of com.mypackage.MyDoFn

2870

# will be stored with MyDoFn as the short_str_value and

2871

# com.mypackage.MyDoFn as the java_class_name value.

2872

# short_str_value can be displayed and java_class_name_value

2873

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2874

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2875

"url": "A String", # An optional full URL.

2876

"floatValue": 3.14, # Contains value if the data is of float type.

2877

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2878

# language namespace (i.e. python module) which defines the display data.

2879

# This allows a dax monitoring system to specially handle the data

2880

# and perform custom rendering.

2881

"javaClassValue": "A String", # Contains value if the data is of java class type.

2882

"label": "A String", # An optional label to display in a dax UI for the element.

2883

"boolValue": True or False, # Contains value if the data is of a boolean type.

2884

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2885

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2886

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2891

# callers cannot mutate it.

2892

{ # A message describing the state of a particular execution stage.

2893

"executionStageName": "A String", # The name of the execution stage.

2894

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2895

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2896

},

2897

],

2898

"id": "A String", # The unique ID of this job.

2899

#

2900

# This field is set by the Cloud Dataflow service when the Job is

2901

# created, and is immutable for the life of the job.

2902

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2903

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2904

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2905

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

2906

# corresponding name prefixes of the new job.

2907

"a_key": "A String",

2908

},

2909

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2910

"workerRegion": "A String", # The Compute Engine region

2911

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2912

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2913

# with worker_zone. If neither worker_region nor worker_zone is specified,

2914

# default to the control plane's region.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2915

"version": { # A structure describing which components and their versions of the service

2916

# are required in order to run the job.

2917

"a_key": "", # Properties of the object.

2918

},

2919

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2920

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2921

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2922

#

2923

# Format:

2924

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2925

"internalExperiments": { # Experimental settings.

2926

"a_key": "", # Properties of the object. Contains field @type with type URL.

2927

},

2928

"dataset": "A String", # The dataset for the current project where various workflow

2929

# related tables are stored.

2930

#

2931

# The supported resource type is:

2932

#

2933

# Google BigQuery:

2934

# bigquery.googleapis.com/{dataset}

2935

"experiments": [ # The list of experiments to enable.

2936

"A String",

2937

],

2938

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2939

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2940

# options are passed through the service and are used to recreate the

2941

# SDK pipeline options on the worker in a language agnostic and platform

2942

# independent way.

2943

"a_key": "", # Properties of the object.

2944

},

2945

"userAgent": { # A description of the process that generated the request.

2946

"a_key": "", # Properties of the object.

2947

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2948

"workerZone": "A String", # The Compute Engine zone

2949

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2950

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2951

# with worker_region. If neither worker_region nor worker_zone is specified,

2952

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2953

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2954

# specified in order for the job to have workers.

2955

{ # Describes one particular pool of Cloud Dataflow workers to be

2956

# instantiated by the Cloud Dataflow service in order to perform the

2957

# computations required by a job. Note that a workflow job may use

2958

# multiple pools, in order to match the various computational

2959

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2960

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2961

# harness, residing in Google Container Registry.

2962

#

2963

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2964

"ipConfiguration": "A String", # Configuration for VM IPs.

2965

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2966

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2967

"algorithm": "A String", # The algorithm to use for autoscaling.

2968

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2969

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2970

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2971

# the service will use the network "default".

2972

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2973

# will attempt to choose a reasonable default.

2974

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2975

"a_key": "A String",

2976

},

2977

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2978

# service will attempt to choose a reasonable default.

2979

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2980

# Compute Engine API.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2981

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2982

# using the standard Dataflow task runner. Users should ignore

2983

# this field.

2984

"workflowFileName": "A String", # The file to store the workflow in.

2985

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2986

# will not be uploaded.

2987

#

2988

# The supported resource type is:

2989

#

2990

# Google Cloud Storage:

2991

# storage.googleapis.com/{bucket}/{object}

2992

# bucket.storage.googleapis.com/{object}

2993

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2994

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2995

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2996

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2997

"vmId": "A String", # The ID string of the VM.

2998

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2999

# taskrunner; e.g. "wheel".

3000

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3001

# taskrunner; e.g. "root".

3002

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3003

# access the Cloud Dataflow API.

3004

"A String",

3005

],

3006

"languageHint": "A String", # The suggested backend language.

3007

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3008

# console.

3009

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3010

"logDir": "A String", # The directory on the VM to store logs.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3011

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3012

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3013

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3014

# "shuffle/v1beta1".

3015

"workerId": "A String", # The ID of the worker running this pipeline.

3016

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3017

#

3018

# When workers access Google Cloud APIs, they logically do so via

3019

# relative URLs. If this field is specified, it supplies the base

3020

# URL to use for resolving these relative URLs. The normative

3021

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3022

# Locators".

3023

#

3024

# If not specified, the default value is "http://www.googleapis.com/"

3025

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3026

# "dataflow/v1b3/projects".

3027

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3028

# storage.

3029

#

3030

# The supported resource type is:

3031

#

3032

# Google Cloud Storage:

3033

#

3034

# storage.googleapis.com/{bucket}/{object}

3035

# bucket.storage.googleapis.com/{object}

3036

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3037

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3038

"harnessCommand": "A String", # The command to launch the worker harness.

3039

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3040

# temporary storage.

3041

#

3042

# The supported resource type is:

3043

#

3044

# Google Cloud Storage:

3045

# storage.googleapis.com/{bucket}/{object}

3046

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3047

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3048

#

3049

# When workers access Google Cloud APIs, they logically do so via

3050

# relative URLs. If this field is specified, it supplies the base

3051

# URL to use for resolving these relative URLs. The normative

3052

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3053

# Locators".

3054

#

3055

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3056

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3057

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3058

# service will choose a number of threads (according to the number of cores

3059

# on the selected machine type for batch, or 1 by convention for streaming).

3060

"poolArgs": { # Extra arguments for this worker pool.

3061

"a_key": "", # Properties of the object. Contains field @type with type URL.

3062

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3063

"packages": [ # Packages to be installed on workers.

3064

{ # The packages that must be installed in order for a worker to run the

3065

# steps of the Cloud Dataflow job that will be assigned to its worker

3066

# pool.

3067

#

3068

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3069

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3070

# might use this to install jars containing the user's code and all of the

3071

# various dependencies (libraries, data files, etc.) required in order

3072

# for that code to run.

3073

"location": "A String", # The resource to read the package from. The supported resource type is:

3074

#

3075

# Google Cloud Storage:

3076

#

3077

# storage.googleapis.com/{bucket}

3078

# bucket.storage.googleapis.com/

3079

"name": "A String", # The name of the package.

3080

},

3081

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3082

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3083

# select a default set of packages which are useful to worker

3084

# harnesses written in a particular language.

3085

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3086

# are supported.

3087

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3088

# attempt to choose a reasonable default.

3089

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3090

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3091

# `TEARDOWN_NEVER`.

3092

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3093

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3094

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3095

# down.

3096

#

3097

# If the workers are not torn down by the service, they will

3098

# continue to run and use Google Compute Engine VM resources in the

3099

# user's project until they are explicitly terminated by the user.

3100

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3101

# policy except for small, manually supervised test jobs.

3102

#

3103

# If unknown or unspecified, the service will attempt to choose a reasonable

3104

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3105

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3106

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3107

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3108

# execute the job. If zero or unspecified, the service will

3109

# attempt to choose a reasonable default.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3110

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3111

# the form "regions/REGION/subnetworks/SUBNETWORK".

3112

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3113

{ # Describes the data disk used by a workflow job.

3114

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3115

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3116

# attempt to choose a reasonable default.

3117

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3118

# must be a disk type appropriate to the project and zone in which

3119

# the workers will run. If unknown or unspecified, the service

3120

# will attempt to choose a reasonable default.

3121

#

3122

# For example, the standard persistent disk type is a resource name

3123

# typically ending in "pd-standard". If SSD persistent disks are

3124

# available, the resource name typically ends with "pd-ssd". The

3125

# actual valid values are defined the Google Compute Engine API,

3126

# not by the Cloud Dataflow API; consult the Google Compute Engine

3127

# documentation for more information about determining the set of

3128

# available disk types for a particular project and zone.

3129

#

3130

# Google Compute Engine Disk types are local to a particular

3131

# project in a particular zone, and so the resource name will

3132

# typically look something like this:

3133

#

3134

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

3135

},

3136

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3137

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3138

# only be set in the Fn API path. For non-cross-language pipelines this

3139

# should have only one entry. Cross-language pipelines will have two or more

3140

# entries.

3141

{ # Defines a SDK harness container for executing Dataflow pipelines.

3142

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3143

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3144

# container instance with this image. If false (or unset) recommends using

3145

# more than one core per SDK container instance with this image for

3146

# efficiency. Note that Dataflow service may choose to override this property

3147

# if needed.

3148

},

3149

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3150

},

3151

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3152

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3153

# unspecified, the service will attempt to choose a reasonable

3154

# default. This should be in the form of the API service name,

3155

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3156

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3157

# storage. The system will append the suffix "/temp-{JOBNAME} to

3158

# this resource prefix, where {JOBNAME} is the value of the

3159

# job_name field. The resulting bucket and object prefix is used

3160

# as the prefix of the resources used to store temporary data

3161

# needed during the job execution. NOTE: This will override the

3162

# value in taskrunner_settings.

3163

# The supported resource type is:

3164

#

3165

# Google Cloud Storage:

3166

#

3167

# storage.googleapis.com/{bucket}/{object}

3168

# bucket.storage.googleapis.com/{object}

3169

},

3170

"location": "A String", # The [regional endpoint]

3171

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3172

# contains this job.

3173

"tempFiles": [ # A set of files the system should be aware of that are used

3174

# for temporary storage. These temporary files will be

3175

# removed on job completion.

3176

# No duplicates are allowed.

3177

# No file patterns are supported.

3178

#

3179

# The supported files are:

3180

#

3181

# Google Cloud Storage:

3182

#

3183

# storage.googleapis.com/{bucket}/{object}

3184

# bucket.storage.googleapis.com/{object}

3185

"A String",

3186

],

3187

"type": "A String", # The type of Cloud Dataflow job.

3188

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3189

# If this field is set, the service will ensure its uniqueness.

3190

# The request to create a job will fail if the service has knowledge of a

3191

# previously submitted job with the same client's ID and job name.

3192

# The caller may use this field to ensure idempotence of job

3193

# creation across retried attempts to create a job.

3194

# By default, the field is empty and, in that case, the service ignores it.

3195

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3196

# snapshot.

3197

"stepsLocation": "A String", # The GCS location where the steps are stored.

3198

"currentStateTime": "A String", # The timestamp associated with the current state.

3199

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3200

# Flexible resource scheduling jobs are started with some delay after job

3201

# creation, so start_time is unset before start and is updated when the

3202

# job is started by the Cloud Dataflow service. For other jobs, start_time

3203

# always equals to create_time and is immutable and set by the Cloud Dataflow

3204

# service.

3205

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3206

# Cloud Dataflow service.

3207

"requestedState": "A String", # The job's requested state.

3208

#

3209

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3210

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3211

# also be used to directly set a job's requested state to

3212

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3213

# job if it has not already reached a terminal state.

3214

"name": "A String", # The user-specified Cloud Dataflow job name.

3215

#

3216

# Only one Job with a given name may exist in a project at any

3217

# given time. If a caller attempts to create a Job with the same

3218

# name as an already-existing Job, the attempt returns the

3219

# existing Job.

3220

#

3221

# The name must match the regular expression

3222

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3223

"steps": [ # Exactly one of step or steps_location should be specified.

3224

#

3225

# The top-level steps that constitute the entire job.

3226

{ # Defines a particular step within a Cloud Dataflow job.

3227

#

3228

# A job consists of multiple steps, each of which performs some

3229

# specific operation as part of the overall job. Data is typically

3230

# passed from one step to another as part of the job.

3231

#

3232

# Here's an example of a sequence of steps which together implement a

3233

# Map-Reduce job:

3234

#

3235

# * Read a collection of data from some source, parsing the

3236

# collection's elements.

3237

#

3238

# * Validate the elements.

3239

#

3240

# * Apply a user-defined function to map each element to some value

3241

# and extract an element-specific key value.

3242

#

3243

# * Group elements with the same key into a single element with

3244

# that key, transforming a multiply-keyed collection into a

3245

# uniquely-keyed collection.

3246

#

3247

# * Write the elements out to some data sink.

3248

#

3249

# Note that the Cloud Dataflow service may be used to run many different

3250

# types of jobs, not just Map-Reduce.

3251

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3252

"name": "A String", # The name that identifies the step. This must be unique for each

3253

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3254

"properties": { # Named properties associated with the step. Each kind of

3255

# predefined step has its own required set of properties.

3256

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

3257

"a_key": "", # Properties of the object.

3258

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3259

},

3260

],

3261

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3262

# of the job it replaced.

3263

#

3264

# When sending a `CreateJobRequest`, you can update a job by specifying it

3265

# here. The job named here is stopped, and its intermediate state is

3266

# transferred to this job.

3267

"currentState": "A String", # The current state of the job.

3268

#

3269

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3270

# specified.

3271

#

3272

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3273

# terminal state. After a job has reached a terminal state, no

3274

# further state updates may be made.

3275

#

3276

# This field may be mutated by the Cloud Dataflow service;

3277

# callers cannot mutate it.

3278

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3279

# isn't contained in the submitted job.

3280

"stages": { # A mapping from each stage to the information about that stage.

3281

"a_key": { # Contains information about how a particular

3282

# google.dataflow.v1beta3.Step will be executed.

3283

"stepName": [ # The steps associated with the execution stage.

3284

# Note that stages may have several steps, and that a given step

3285

# might be run by more than one stage.

"A String",

],

},

},

},

}

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3300

3301

{ # Defines a job to be run by the Cloud Dataflow service.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3302

"labels": { # User-defined labels for this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3303

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3304

# The labels map can contain no more than 64 entries. Entries of the labels

3305

# map are UTF8 strings that comply with the following restrictions:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3306

#

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3307

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3308

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3309

# * Both keys and values are additionally constrained to be <= 128 bytes in

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3310

# size.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3311

"a_key": "A String",

3312

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3313

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3314

# by the metadata values provided here. Populated for ListJobs and all GetJob

3315

# views SUMMARY and higher.

3316

# ListJob response and Job SUMMARY view.

3317

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3318

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3319

"version": "A String", # The version of the SDK used to run the job.

3320

"sdkSupportStatus": "A String", # The support status for this SDK version.

3321

},

3322

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3323

{ # Metadata for a PubSub connector used by the job.

3324

"topic": "A String", # Topic accessed in the connection.

3325

"subscription": "A String", # Subscription used in the connection.

3326

},

3327

],

3328

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3329

{ # Metadata for a Datastore connector used by the job.

3330

"projectId": "A String", # ProjectId accessed in the connection.

3331

"namespace": "A String", # Namespace used in the connection.

3332

},

3333

],

3334

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3335

{ # Metadata for a File connector used by the job.

3336

"filePattern": "A String", # File Pattern used to access files by the connector.

3337

},

3338

],

3339

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3340

{ # Metadata for a Spanner connector used by the job.

3341

"instanceId": "A String", # InstanceId accessed in the connection.

3342

"projectId": "A String", # ProjectId accessed in the connection.

3343

"databaseId": "A String", # DatabaseId accessed in the connection.

3344

},

3345

],

3346

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3347

{ # Metadata for a BigTable connector used by the job.

3348

"instanceId": "A String", # InstanceId accessed in the connection.

3349

"projectId": "A String", # ProjectId accessed in the connection.

3350

"tableId": "A String", # TableId accessed in the connection.

3351

},

3352

],

3353

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3354

{ # Metadata for a BigQuery connector used by the job.

3355

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3356

"query": "A String", # Query used to access data in the connection.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3357

"table": "A String", # Table accessed in the connection.

3358

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3363

# A description of the user pipeline and stages through which it is executed.

3364

# Created by Cloud Dataflow service. Only retrieved with

3365

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3366

# form. This data is provided by the Dataflow service for ease of visualizing

3367

# the pipeline and interpreting Dataflow provided metrics.

3368

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3369

{ # Description of the type, names/ids, and input/outputs for a transform.

3370

"kind": "A String", # Type of transform.

3371

"name": "A String", # User provided name for this transform instance.

3372

"inputCollectionName": [ # User names for all collection inputs to this transform.

3373

"A String",

3374

],

3375

"displayData": [ # Transform-specific display data.

3376

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3377

"key": "A String", # The key identifying the display data.

3378

# This is intended to be used as a label for the display data

3379

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3380

"shortStrValue": "A String", # A possible additional shorter value to display.

3381

# For example a java_class_name_value of com.mypackage.MyDoFn

3382

# will be stored with MyDoFn as the short_str_value and

3383

# com.mypackage.MyDoFn as the java_class_name value.

3384

# short_str_value can be displayed and java_class_name_value

3385

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3386

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3387

"url": "A String", # An optional full URL.

3388

"floatValue": 3.14, # Contains value if the data is of float type.

3389

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3390

# language namespace (i.e. python module) which defines the display data.

3391

# This allows a dax monitoring system to specially handle the data

3392

# and perform custom rendering.

3393

"javaClassValue": "A String", # Contains value if the data is of java class type.

3394

"label": "A String", # An optional label to display in a dax UI for the element.

3395

"boolValue": True or False, # Contains value if the data is of a boolean type.

3396

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3397

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3398

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3399

},

3400

],

3401

"outputCollectionName": [ # User names for all collection outputs to this transform.

3402

"A String",

3403

],

3404

"id": "A String", # SDK generated id of this transform instance.

3405

},

3406

],

3407

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3408

{ # Description of the composing transforms, names/ids, and input/outputs of a

3409

# stage of execution. Some composing transforms and sources may have been

3410

# generated by the Dataflow service during execution planning.

3411

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3412

{ # Description of an interstitial value between transforms in an execution

3413

# stage.

3414

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3415

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3416

# source is most closely associated.

3417

"name": "A String", # Dataflow service generated name for this source.

3418

},

3419

],

3420

"kind": "A String", # Type of tranform this stage is executing.

3421

"name": "A String", # Dataflow service generated name for this stage.

3422

"outputSource": [ # Output sources for this stage.

3423

{ # Description of an input or output of an execution stage.

3424

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3425

"sizeBytes": "A String", # Size of the source, if measurable.

3426

"name": "A String", # Dataflow service generated name for this source.

3427

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3428

# source is most closely associated.

3429

},

3430

],

3431

"inputSource": [ # Input sources for this stage.

3432

{ # Description of an input or output of an execution stage.

3433

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3434

"sizeBytes": "A String", # Size of the source, if measurable.

3435

"name": "A String", # Dataflow service generated name for this source.

3436

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3437

# source is most closely associated.

3438

},

3439

],

3440

"componentTransform": [ # Transforms that comprise this execution stage.

3441

{ # Description of a transform executed as part of an execution stage.

3442

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3443

"originalTransform": "A String", # User name for the original user transform with which this transform is

3444

# most closely associated.

3445

"name": "A String", # Dataflow service generated name for this source.

3446

},

3447

],

3448

"id": "A String", # Dataflow service generated id for this stage.

3449

},

3450

],

3451

"displayData": [ # Pipeline level display data.

3452

{ # Data provided with a pipeline or transform to provide descriptive info.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3453

"key": "A String", # The key identifying the display data.

3454

# This is intended to be used as a label for the display data

3455

# when viewed in a dax monitoring system.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3456

"shortStrValue": "A String", # A possible additional shorter value to display.

3457

# For example a java_class_name_value of com.mypackage.MyDoFn

3458

# will be stored with MyDoFn as the short_str_value and

3459

# com.mypackage.MyDoFn as the java_class_name value.

3460

# short_str_value can be displayed and java_class_name_value

3461

# will be displayed as a tooltip.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3462

"timestampValue": "A String", # Contains value if the data is of timestamp type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3463

"url": "A String", # An optional full URL.

3464

"floatValue": 3.14, # Contains value if the data is of float type.

3465

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3466

# language namespace (i.e. python module) which defines the display data.

3467

# This allows a dax monitoring system to specially handle the data

3468

# and perform custom rendering.

3469

"javaClassValue": "A String", # Contains value if the data is of java class type.

3470

"label": "A String", # An optional label to display in a dax UI for the element.

3471

"boolValue": True or False, # Contains value if the data is of a boolean type.

3472

"strValue": "A String", # Contains value if the data is of string type.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3473

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3474

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

},

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3479

# callers cannot mutate it.

3480

{ # A message describing the state of a particular execution stage.

3481

"executionStageName": "A String", # The name of the execution stage.

3482

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3483

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3484

},

3485

],

3486

"id": "A String", # The unique ID of this job.

3487

#

3488

# This field is set by the Cloud Dataflow service when the Job is

3489

# created, and is immutable for the life of the job.

3490

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3491

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3492

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3493

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

3494

# corresponding name prefixes of the new job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3495

"a_key": "A String",

3496

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3497

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3498

"workerRegion": "A String", # The Compute Engine region

3499

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3500

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3501

# with worker_zone. If neither worker_region nor worker_zone is specified,

3502

# default to the control plane's region.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3503

"version": { # A structure describing which components and their versions of the service

3504

# are required in order to run the job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3505

"a_key": "", # Properties of the object.

3506

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3507

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3508

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3509

# at rest, AKA a Customer Managed Encryption Key (CMEK).

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3510

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3511

# Format:

3512

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3513

"internalExperiments": { # Experimental settings.

3514

"a_key": "", # Properties of the object. Contains field @type with type URL.

3515

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3516

"dataset": "A String", # The dataset for the current project where various workflow

3517

# related tables are stored.

3518

#

3519

# The supported resource type is:

3520

#

3521

# Google BigQuery:

3522

# bigquery.googleapis.com/{dataset}

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3523

"experiments": [ # The list of experiments to enable.

3524

"A String",

3525

],

3526

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3527

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3528

# options are passed through the service and are used to recreate the

3529

# SDK pipeline options on the worker in a language agnostic and platform

3530

# independent way.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3531

"a_key": "", # Properties of the object.

3532

},

3533

"userAgent": { # A description of the process that generated the request.

3534

"a_key": "", # Properties of the object.

3535

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3536

"workerZone": "A String", # The Compute Engine zone

3537

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3538

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3539

# with worker_region. If neither worker_region nor worker_zone is specified,

3540

# a zone in the control plane's region is chosen based on available capacity.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3541

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

3542

# specified in order for the job to have workers.

3543

{ # Describes one particular pool of Cloud Dataflow workers to be

3544

# instantiated by the Cloud Dataflow service in order to perform the

3545

# computations required by a job. Note that a workflow job may use

3546

# multiple pools, in order to match the various computational

3547

# requirements of the various stages of the job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3548

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3549

# harness, residing in Google Container Registry.

3550

#

3551

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

3552

"ipConfiguration": "A String", # Configuration for VM IPs.

3553

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3554

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3555

"algorithm": "A String", # The algorithm to use for autoscaling.

3556

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3557

"diskSourceImage": "A String", # Fully qualified source image for disks.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3558

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3559

# the service will use the network "default".

3560

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

3561

# will attempt to choose a reasonable default.

3562

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3563

"a_key": "A String",

3564

},

3565

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3566

# service will attempt to choose a reasonable default.

3567

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3568

# Compute Engine API.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3569

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3570

# using the standard Dataflow task runner. Users should ignore

3571

# this field.

3572

"workflowFileName": "A String", # The file to store the workflow in.

3573

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3574

# will not be uploaded.

3575

#

3576

# The supported resource type is:

3577

#

3578

# Google Cloud Storage:

3579

# storage.googleapis.com/{bucket}/{object}

3580

# bucket.storage.googleapis.com/{object}

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

3581

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3582

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

3583

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3584

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3585

"vmId": "A String", # The ID string of the VM.

3586

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3587

# taskrunner; e.g. "wheel".

3588

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3589

# taskrunner; e.g. "root".

3590

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3591

# access the Cloud Dataflow API.

3592

"A String",

3593

],

3594

"languageHint": "A String", # The suggested backend language.

3595

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3596

# console.

3597

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3598

"logDir": "A String", # The directory on the VM to store logs.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3599

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3600

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3601

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3602

# "shuffle/v1beta1".

3603

"workerId": "A String", # The ID of the worker running this pipeline.

3604

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3605

#

3606

# When workers access Google Cloud APIs, they logically do so via

3607

# relative URLs. If this field is specified, it supplies the base

3608

# URL to use for resolving these relative URLs. The normative

3609

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3610

# Locators".

3611

#

3612

# If not specified, the default value is "http://www.googleapis.com/"

3613

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3614

# "dataflow/v1b3/projects".

3615

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3616

# storage.

3617

#

3618

# The supported resource type is:

3619

#

3620

# Google Cloud Storage:

3621

#

3622

# storage.googleapis.com/{bucket}/{object}

3623

# bucket.storage.googleapis.com/{object}

3624

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3625

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

3626

"harnessCommand": "A String", # The command to launch the worker harness.

3627

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3628

# temporary storage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3629

#

Sai Cheemalapati

2017-03-24 15:06:46 -0700

[diff] [blame]

3630

# The supported resource type is:

3631

#

3632

# Google Cloud Storage:

3633

# storage.googleapis.com/{bucket}/{object}

3634

# bucket.storage.googleapis.com/{object}

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3635

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3636

#

3637

# When workers access Google Cloud APIs, they logically do so via

3638

# relative URLs. If this field is specified, it supplies the base

3639

# URL to use for resolving these relative URLs. The normative

3640

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3641

# Locators".

3642

#

3643

# If not specified, the default value is "http://www.googleapis.com/"

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3644

},

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3645

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3646

# service will choose a number of threads (according to the number of cores

3647

# on the selected machine type for batch, or 1 by convention for streaming).

3648

"poolArgs": { # Extra arguments for this worker pool.

3649

"a_key": "", # Properties of the object. Contains field @type with type URL.

3650

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3651

"packages": [ # Packages to be installed on workers.

3652

{ # The packages that must be installed in order for a worker to run the

3653

# steps of the Cloud Dataflow job that will be assigned to its worker

3654

# pool.

3655

#

3656

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3657

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3658

# might use this to install jars containing the user's code and all of the

3659

# various dependencies (libraries, data files, etc.) required in order

3660

# for that code to run.

3661

"location": "A String", # The resource to read the package from. The supported resource type is:

3662

#

3663

# Google Cloud Storage:

3664

#

3665

# storage.googleapis.com/{bucket}

3666

# bucket.storage.googleapis.com/

3667

"name": "A String", # The name of the package.

3668

},

3669

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3670

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3671

# select a default set of packages which are useful to worker

3672

# harnesses written in a particular language.

3673

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3674

# are supported.

3675

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3676

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3677

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3678

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3679

# `TEARDOWN_NEVER`.

3680

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3681

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3682

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3683

# down.

3684

#

3685

# If the workers are not torn down by the service, they will

3686

# continue to run and use Google Compute Engine VM resources in the

3687

# user's project until they are explicitly terminated by the user.

3688

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3689

# policy except for small, manually supervised test jobs.

3690

#

3691

# If unknown or unspecified, the service will attempt to choose a reasonable

3692

# default.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3693

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3694

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3695

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3696

# execute the job. If zero or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3697

# attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3698

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3699

# the form "regions/REGION/subnetworks/SUBNETWORK".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3700

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3701

{ # Describes the data disk used by a workflow job.

3702

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3703

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3704

# attempt to choose a reasonable default.

3705

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3706

# must be a disk type appropriate to the project and zone in which

3707

# the workers will run. If unknown or unspecified, the service

3708

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3709

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3710

# For example, the standard persistent disk type is a resource name

3711

# typically ending in "pd-standard". If SSD persistent disks are

3712

# available, the resource name typically ends with "pd-ssd". The

3713

# actual valid values are defined the Google Compute Engine API,

3714

# not by the Cloud Dataflow API; consult the Google Compute Engine

3715

# documentation for more information about determining the set of

3716

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3717

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3718

# Google Compute Engine Disk types are local to a particular

3719

# project in a particular zone, and so the resource name will

3720

# typically look something like this:

3721

#

3722

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3723

},

3724

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3725

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3726

# only be set in the Fn API path. For non-cross-language pipelines this

3727

# should have only one entry. Cross-language pipelines will have two or more

3728

# entries.

3729

{ # Defines a SDK harness container for executing Dataflow pipelines.

3730

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3731

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3732

# container instance with this image. If false (or unset) recommends using

3733

# more than one core per SDK container instance with this image for

3734

# efficiency. Note that Dataflow service may choose to override this property

3735

# if needed.

3736

},

3737

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3738

},

3739

],

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3740

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3741

# unspecified, the service will attempt to choose a reasonable

3742

# default. This should be in the form of the API service name,

3743

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3744

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3745

# storage. The system will append the suffix "/temp-{JOBNAME} to

3746

# this resource prefix, where {JOBNAME} is the value of the

3747

# job_name field. The resulting bucket and object prefix is used

3748

# as the prefix of the resources used to store temporary data

3749

# needed during the job execution. NOTE: This will override the

3750

# value in taskrunner_settings.

3751

# The supported resource type is:

3752

#

3753

# Google Cloud Storage:

3754

#

3755

# storage.googleapis.com/{bucket}/{object}

3756

# bucket.storage.googleapis.com/{object}

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3757

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3758

"location": "A String", # The [regional endpoint]

3759

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3760

# contains this job.

3761

"tempFiles": [ # A set of files the system should be aware of that are used

3762

# for temporary storage. These temporary files will be

3763

# removed on job completion.

3764

# No duplicates are allowed.

3765

# No file patterns are supported.

3766

#

3767

# The supported files are:

3768

#

3769

# Google Cloud Storage:

3770

#

3771

# storage.googleapis.com/{bucket}/{object}

3772

# bucket.storage.googleapis.com/{object}

3773

"A String",

3774

],

3775

"type": "A String", # The type of Cloud Dataflow job.

3776

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3777

# If this field is set, the service will ensure its uniqueness.

3778

# The request to create a job will fail if the service has knowledge of a

3779

# previously submitted job with the same client's ID and job name.

3780

# The caller may use this field to ensure idempotence of job

3781

# creation across retried attempts to create a job.

3782

# By default, the field is empty and, in that case, the service ignores it.

3783

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3784

# snapshot.

3785

"stepsLocation": "A String", # The GCS location where the steps are stored.

3786

"currentStateTime": "A String", # The timestamp associated with the current state.

3787

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3788

# Flexible resource scheduling jobs are started with some delay after job

3789

# creation, so start_time is unset before start and is updated when the

3790

# job is started by the Cloud Dataflow service. For other jobs, start_time

3791

# always equals to create_time and is immutable and set by the Cloud Dataflow

3792

# service.

3793

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3794

# Cloud Dataflow service.

3795

"requestedState": "A String", # The job's requested state.

3796

#

3797

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3798

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3799

# also be used to directly set a job's requested state to

3800

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3801

# job if it has not already reached a terminal state.

3802

"name": "A String", # The user-specified Cloud Dataflow job name.

3803

#

3804

# Only one Job with a given name may exist in a project at any

3805

# given time. If a caller attempts to create a Job with the same

3806

# name as an already-existing Job, the attempt returns the

3807

# existing Job.

3808

#

3809

# The name must match the regular expression

3810

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3811

"steps": [ # Exactly one of step or steps_location should be specified.

3812

#

3813

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3814

{ # Defines a particular step within a Cloud Dataflow job.

3815

#

3816

# A job consists of multiple steps, each of which performs some

3817

# specific operation as part of the overall job. Data is typically

3818

# passed from one step to another as part of the job.

3819

#

3820

# Here's an example of a sequence of steps which together implement a

3821

# Map-Reduce job:

3822

#

3823

# * Read a collection of data from some source, parsing the

3824

# collection's elements.

3825

#

3826

# * Validate the elements.

3827

#

3828

# * Apply a user-defined function to map each element to some value

3829

# and extract an element-specific key value.

3830

#

3831

# * Group elements with the same key into a single element with

3832

# that key, transforming a multiply-keyed collection into a

3833

# uniquely-keyed collection.

3834

#

3835

# * Write the elements out to some data sink.

3836

#

3837

# Note that the Cloud Dataflow service may be used to run many different

3838

# types of jobs, not just Map-Reduce.

3839

"kind": "A String", # The kind of step in the Cloud Dataflow job.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3840

"name": "A String", # The name that identifies the step. This must be unique for each

3841

# step with respect to all other steps in the Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3842

"properties": { # Named properties associated with the step. Each kind of

3843

# predefined step has its own required set of properties.

3844

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3845

"a_key": "", # Properties of the object.

3846

},

3847

},

3848

],

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

3849

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3850

# of the job it replaced.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3851

#

Thomas Coffee

2017-03-27 10:39:26 -0700

[diff] [blame]

3852

# When sending a `CreateJobRequest`, you can update a job by specifying it

3853

# here. The job named here is stopped, and its intermediate state is

3854

# transferred to this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3855

"currentState": "A String", # The current state of the job.

3856

#

3857

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3858

# specified.

3859

#

3860

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3861

# terminal state. After a job has reached a terminal state, no

3862

# further state updates may be made.

3863

#

3864

# This field may be mutated by the Cloud Dataflow service;

3865

# callers cannot mutate it.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3866

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3867

# isn't contained in the submitted job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3868

"stages": { # A mapping from each stage to the information about that stage.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3869

"a_key": { # Contains information about how a particular

3870

# google.dataflow.v1beta3.Step will be executed.

3871

"stepName": [ # The steps associated with the execution stage.

3872

# Note that stages may have several steps, and that a given step

3873

# might be run by more than one stage.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

"A String",

],

},

},

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3879

}</pre>

Jon Wayne Parrott