Blame - docs/dyn/dataflow_v1b3.projects.jobs.html - platform/external/python/google-api-python-client

2015-06-15 16:44:50 +0000

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

88

<code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>

89

</p>

90

<p class="firstline">Returns the workItems Resource.</p>

91

92

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

93

<code><a href="#aggregated">aggregated(projectId, pageToken=None, pageSize=None, view=None, filter=None, location=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

94

<p class="firstline">List the jobs of a project across all regions.</p>

95

96

<code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>

97

<p class="firstline">Retrieves the next page of results.</p>

98

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

99

<code><a href="#create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

100

<p class="firstline">Creates a Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

101

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

102

<code><a href="#get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

103

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

104

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

105

<code><a href="#getMetrics">getMetrics(projectId, jobId, location=None, startTime=None, x__xgafv=None)</a></code></p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

106

<p class="firstline">Request the job status.</p>

107

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

108

<code><a href="#list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

109

<p class="firstline">List the jobs of a project.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

110

111

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

112

<p class="firstline">Retrieves the next page of results.</p>

113

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

114

<code><a href="#snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

115

<p class="firstline">Snapshot the state of a streaming job.</p>

116

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

117

<code><a href="#update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

118

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

119

<h3>Method Details</h3>

120

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

121

<code class="details" id="aggregated">aggregated(projectId, pageToken=None, pageSize=None, view=None, filter=None, location=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

122

<pre>List the jobs of a project across all regions.

123

124

Args:

125

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

126

pageToken: string, Set this to the 'next_page_token' field of a previous response

127

to request additional results in a long list.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

128

pageSize: integer, If there are many jobs, limit response to at most this many.

129

The actual number of jobs returned will be the lesser of max_responses

130

and an unspecified server-defined limit.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

131

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

132

filter: string, The kind of filter to use.

133

location: string, The [regional endpoint]

134

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

135

contains this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

136

x__xgafv: string, V1 error format.

137

Allowed values

138

1 - v1 error format

139

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

140

141

Returns:

142

An object of the form:

143

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

144

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

145

# be a partial response, depending on the page size in the ListJobsRequest.

146

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

147

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

148

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

149

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

150

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

151

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

152

# failed to respond.

153

{ # Indicates which [regional endpoint]

154

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

155

# to respond to a request for data.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

156

"name": "A String", # The name of the [regional endpoint]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

157

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

158

# failed to respond.

159

},

160

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

161

"jobs": [ # A subset of the requested job information.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

162

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

163

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

164

# If this field is set, the service will ensure its uniqueness.

165

# The request to create a job will fail if the service has knowledge of a

166

# previously submitted job with the same client's ID and job name.

167

# The caller may use this field to ensure idempotence of job

168

# creation across retried attempts to create a job.

169

# By default, the field is empty and, in that case, the service ignores it.

170

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

171

#

172

# This field is set by the Cloud Dataflow service when the Job is

173

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

174

"currentStateTime": "A String", # The timestamp associated with the current state.

175

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

176

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

177

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

178

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

179

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

180

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

181

# options are passed through the service and are used to recreate the

182

# SDK pipeline options on the worker in a language agnostic and platform

183

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

184

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

185

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

186

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

187

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

188

# specified in order for the job to have workers.

189

{ # Describes one particular pool of Cloud Dataflow workers to be

190

# instantiated by the Cloud Dataflow service in order to perform the

191

# computations required by a job. Note that a workflow job may use

192

# multiple pools, in order to match the various computational

193

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

194

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

195

# select a default set of packages which are useful to worker

196

# harnesses written in a particular language.

197

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

198

# the service will use the network "default".

199

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

200

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

201

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

202

# execute the job. If zero or unspecified, the service will

203

# attempt to choose a reasonable default.

204

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

205

# service will choose a number of threads (according to the number of cores

206

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

207

"diskSourceImage": "A String", # Fully qualified source image for disks.

208

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

209

{ # The packages that must be installed in order for a worker to run the

210

# steps of the Cloud Dataflow job that will be assigned to its worker

211

# pool.

212

#

213

# This is the mechanism by which the Cloud Dataflow SDK causes code to

214

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

215

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

216

# various dependencies (libraries, data files, etc.) required in order

217

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

218

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

219

#

220

# Google Cloud Storage:

221

#

222

# storage.googleapis.com/{bucket}

223

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

224

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

225

},

226

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

227

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

228

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

229

# `TEARDOWN_NEVER`.

230

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

231

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

232

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

233

# down.

234

#

235

# If the workers are not torn down by the service, they will

236

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

237

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

238

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

239

# policy except for small, manually supervised test jobs.

240

#

241

# If unknown or unspecified, the service will attempt to choose a reasonable

242

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

243

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

244

# Compute Engine API.

245

"poolArgs": { # Extra arguments for this worker pool.

246

"a_key": "", # Properties of the object. Contains field @type with type URL.

247

},

248

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

249

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

250

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

251

# harness, residing in Google Container Registry.

252

#

253

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

254

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

255

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

256

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

257

# service will attempt to choose a reasonable default.

258

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

259

# are supported.

260

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

261

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

262

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

263

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

264

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

265

# must be a disk type appropriate to the project and zone in which

266

# the workers will run. If unknown or unspecified, the service

267

# will attempt to choose a reasonable default.

268

#

269

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

270

# typically ending in "pd-standard". If SSD persistent disks are

271

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

272

# actual valid values are defined the Google Compute Engine API,

273

# not by the Cloud Dataflow API; consult the Google Compute Engine

274

# documentation for more information about determining the set of

275

# available disk types for a particular project and zone.

276

#

277

# Google Compute Engine Disk types are local to a particular

278

# project in a particular zone, and so the resource name will

279

# typically look something like this:

280

#

281

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

282

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

283

},

284

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

285

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

286

# only be set in the Fn API path. For non-cross-language pipelines this

287

# should have only one entry. Cross-language pipelines will have two or more

288

# entries.

289

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

290

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

291

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

292

# container instance with this image. If false (or unset) recommends using

293

# more than one core per SDK container instance with this image for

294

# efficiency. Note that Dataflow service may choose to override this property

295

# if needed.

296

},

297

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

298

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

299

# the form "regions/REGION/subnetworks/SUBNETWORK".

300

"ipConfiguration": "A String", # Configuration for VM IPs.

301

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

302

# using the standard Dataflow task runner. Users should ignore

303

# this field.

304

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

305

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

306

# taskrunner; e.g. "wheel".

307

"harnessCommand": "A String", # The command to launch the worker harness.

308

"logDir": "A String", # The directory on the VM to store logs.

309

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

310

# access the Cloud Dataflow API.

311

"A String",

312

],

313

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

314

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

315

# will not be uploaded.

316

#

317

# The supported resource type is:

318

#

319

# Google Cloud Storage:

320

# storage.googleapis.com/{bucket}/{object}

321

# bucket.storage.googleapis.com/{object}

322

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

323

"workflowFileName": "A String", # The file to store the workflow in.

324

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

325

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

326

# temporary storage.

327

#

328

# The supported resource type is:

329

#

330

# Google Cloud Storage:

331

# storage.googleapis.com/{bucket}/{object}

332

# bucket.storage.googleapis.com/{object}

333

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

334

"languageHint": "A String", # The suggested backend language.

335

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

336

#

337

# When workers access Google Cloud APIs, they logically do so via

338

# relative URLs. If this field is specified, it supplies the base

339

# URL to use for resolving these relative URLs. The normative

340

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

341

# Locators".

342

#

343

# If not specified, the default value is "http://www.googleapis.com/"

344

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

345

# console.

346

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

347

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

348

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

349

#

350

# When workers access Google Cloud APIs, they logically do so via

351

# relative URLs. If this field is specified, it supplies the base

352

# URL to use for resolving these relative URLs. The normative

353

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

354

# Locators".

355

#

356

# If not specified, the default value is "http://www.googleapis.com/"

357

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

358

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

359

# "dataflow/v1b3/projects".

360

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

361

# "shuffle/v1beta1".

362

"workerId": "A String", # The ID of the worker running this pipeline.

363

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

364

# storage.

365

#

366

# The supported resource type is:

367

#

368

# Google Cloud Storage:

369

#

370

# storage.googleapis.com/{bucket}/{object}

371

# bucket.storage.googleapis.com/{object}

372

},

373

"vmId": "A String", # The ID string of the VM.

374

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

375

# taskrunner; e.g. "root".

376

},

377

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

378

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

379

"algorithm": "A String", # The algorithm to use for autoscaling.

380

},

381

"metadata": { # Metadata to set on the Google Compute Engine VMs.

382

"a_key": "A String",

383

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

384

},

385

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

386

"dataset": "A String", # The dataset for the current project where various workflow

387

# related tables are stored.

388

#

389

# The supported resource type is:

390

#

391

# Google BigQuery:

392

# bigquery.googleapis.com/{dataset}

393

"internalExperiments": { # Experimental settings.

394

"a_key": "", # Properties of the object. Contains field @type with type URL.

395

},

396

"workerRegion": "A String", # The Compute Engine region

397

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

398

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

399

# with worker_zone. If neither worker_region nor worker_zone is specified,

400

# default to the control plane's region.

401

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

402

# at rest, AKA a Customer Managed Encryption Key (CMEK).

403

#

404

# Format:

405

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

406

"userAgent": { # A description of the process that generated the request.

407

"a_key": "", # Properties of the object.

408

},

409

"workerZone": "A String", # The Compute Engine zone

410

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

411

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

412

# with worker_region. If neither worker_region nor worker_zone is specified,

413

# a zone in the control plane's region is chosen based on available capacity.

414

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

415

# unspecified, the service will attempt to choose a reasonable

416

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

417

# e.g. "compute.googleapis.com".

418

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

419

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

420

# this resource prefix, where {JOBNAME} is the value of the

421

# job_name field. The resulting bucket and object prefix is used

422

# as the prefix of the resources used to store temporary data

423

# needed during the job execution. NOTE: This will override the

424

# value in taskrunner_settings.

425

# The supported resource type is:

426

#

427

# Google Cloud Storage:

428

#

429

# storage.googleapis.com/{bucket}/{object}

430

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

431

"experiments": [ # The list of experiments to enable.

432

"A String",

433

],

434

"version": { # A structure describing which components and their versions of the service

435

# are required in order to run the job.

436

"a_key": "", # Properties of the object.

437

},

438

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

439

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

440

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

441

# callers cannot mutate it.

442

{ # A message describing the state of a particular execution stage.

443

"executionStageName": "A String", # The name of the execution stage.

444

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

445

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

446

},

447

],

448

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

449

# by the metadata values provided here. Populated for ListJobs and all GetJob

450

# views SUMMARY and higher.

451

# ListJob response and Job SUMMARY view.

452

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

453

{ # Metadata for a BigTable connector used by the job.

454

"tableId": "A String", # TableId accessed in the connection.

455

"projectId": "A String", # ProjectId accessed in the connection.

456

"instanceId": "A String", # InstanceId accessed in the connection.

457

},

458

],

459

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

460

{ # Metadata for a Spanner connector used by the job.

461

"databaseId": "A String", # DatabaseId accessed in the connection.

462

"instanceId": "A String", # InstanceId accessed in the connection.

463

"projectId": "A String", # ProjectId accessed in the connection.

464

},

465

],

466

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

467

{ # Metadata for a Datastore connector used by the job.

468

"projectId": "A String", # ProjectId accessed in the connection.

469

"namespace": "A String", # Namespace used in the connection.

470

},

471

],

472

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

473

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

474

"sdkSupportStatus": "A String", # The support status for this SDK version.

475

"version": "A String", # The version of the SDK used to run the job.

476

},

477

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

478

{ # Metadata for a BigQuery connector used by the job.

479

"table": "A String", # Table accessed in the connection.

480

"dataset": "A String", # Dataset accessed in the connection.

481

"projectId": "A String", # Project accessed in the connection.

482

"query": "A String", # Query used to access data in the connection.

483

},

484

],

485

"fileDetails": [ # Identification of a File source used in the Dataflow job.

486

{ # Metadata for a File connector used by the job.

487

"filePattern": "A String", # File Pattern used to access files by the connector.

488

},

489

],

490

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

491

{ # Metadata for a PubSub connector used by the job.

492

"subscription": "A String", # Subscription used in the connection.

493

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

498

# snapshot.

499

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

500

"type": "A String", # The type of Cloud Dataflow job.

501

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

502

# A description of the user pipeline and stages through which it is executed.

503

# Created by Cloud Dataflow service. Only retrieved with

504

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

505

# form. This data is provided by the Dataflow service for ease of visualizing

506

# the pipeline and interpreting Dataflow provided metrics.

507

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

508

{ # Description of the composing transforms, names/ids, and input/outputs of a

509

# stage of execution. Some composing transforms and sources may have been

510

# generated by the Dataflow service during execution planning.

511

"id": "A String", # Dataflow service generated id for this stage.

512

"componentTransform": [ # Transforms that comprise this execution stage.

513

{ # Description of a transform executed as part of an execution stage.

514

"originalTransform": "A String", # User name for the original user transform with which this transform is

515

# most closely associated.

516

"name": "A String", # Dataflow service generated name for this source.

517

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

518

},

519

],

520

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

521

{ # Description of an interstitial value between transforms in an execution

522

# stage.

523

"name": "A String", # Dataflow service generated name for this source.

524

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

525

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

526

# source is most closely associated.

527

},

528

],

529

"kind": "A String", # Type of tranform this stage is executing.

530

"outputSource": [ # Output sources for this stage.

531

{ # Description of an input or output of an execution stage.

532

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

533

# source is most closely associated.

534

"name": "A String", # Dataflow service generated name for this source.

535

"sizeBytes": "A String", # Size of the source, if measurable.

536

"userName": "A String", # Human-readable name for this source; may be user or system generated.

537

},

538

],

539

"name": "A String", # Dataflow service generated name for this stage.

540

"inputSource": [ # Input sources for this stage.

541

{ # Description of an input or output of an execution stage.

542

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

543

# source is most closely associated.

544

"name": "A String", # Dataflow service generated name for this source.

545

"sizeBytes": "A String", # Size of the source, if measurable.

546

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

552

{ # Description of the type, names/ids, and input/outputs for a transform.

553

"kind": "A String", # Type of transform.

554

"inputCollectionName": [ # User names for all collection inputs to this transform.

555

"A String",

556

],

557

"name": "A String", # User provided name for this transform instance.

558

"id": "A String", # SDK generated id of this transform instance.

559

"displayData": [ # Transform-specific display data.

560

{ # Data provided with a pipeline or transform to provide descriptive info.

561

"timestampValue": "A String", # Contains value if the data is of timestamp type.

562

"boolValue": True or False, # Contains value if the data is of a boolean type.

563

"javaClassValue": "A String", # Contains value if the data is of java class type.

564

"strValue": "A String", # Contains value if the data is of string type.

565

"int64Value": "A String", # Contains value if the data is of int64 type.

566

"durationValue": "A String", # Contains value if the data is of duration type.

567

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

568

# language namespace (i.e. python module) which defines the display data.

569

# This allows a dax monitoring system to specially handle the data

570

# and perform custom rendering.

571

"floatValue": 3.14, # Contains value if the data is of float type.

572

"key": "A String", # The key identifying the display data.

573

# This is intended to be used as a label for the display data

574

# when viewed in a dax monitoring system.

575

"shortStrValue": "A String", # A possible additional shorter value to display.

576

# For example a java_class_name_value of com.mypackage.MyDoFn

577

# will be stored with MyDoFn as the short_str_value and

578

# com.mypackage.MyDoFn as the java_class_name value.

579

# short_str_value can be displayed and java_class_name_value

580

# will be displayed as a tooltip.

581

"url": "A String", # An optional full URL.

582

"label": "A String", # An optional label to display in a dax UI for the element.

583

},

584

],

585

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

591

{ # Data provided with a pipeline or transform to provide descriptive info.

592

"timestampValue": "A String", # Contains value if the data is of timestamp type.

593

"boolValue": True or False, # Contains value if the data is of a boolean type.

594

"javaClassValue": "A String", # Contains value if the data is of java class type.

595

"strValue": "A String", # Contains value if the data is of string type.

596

"int64Value": "A String", # Contains value if the data is of int64 type.

597

"durationValue": "A String", # Contains value if the data is of duration type.

598

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

599

# language namespace (i.e. python module) which defines the display data.

600

# This allows a dax monitoring system to specially handle the data

601

# and perform custom rendering.

602

"floatValue": 3.14, # Contains value if the data is of float type.

603

"key": "A String", # The key identifying the display data.

604

# This is intended to be used as a label for the display data

605

# when viewed in a dax monitoring system.

606

"shortStrValue": "A String", # A possible additional shorter value to display.

607

# For example a java_class_name_value of com.mypackage.MyDoFn

608

# will be stored with MyDoFn as the short_str_value and

609

# com.mypackage.MyDoFn as the java_class_name value.

610

# short_str_value can be displayed and java_class_name_value

611

# will be displayed as a tooltip.

612

"url": "A String", # An optional full URL.

613

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

618

# of the job it replaced.

619

#

620

# When sending a `CreateJobRequest`, you can update a job by specifying it

621

# here. The job named here is stopped, and its intermediate state is

622

# transferred to this job.

623

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

624

# for temporary storage. These temporary files will be

625

# removed on job completion.

626

# No duplicates are allowed.

627

# No file patterns are supported.

628

#

629

# The supported files are:

630

#

631

# Google Cloud Storage:

632

#

633

# storage.googleapis.com/{bucket}/{object}

634

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

635

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

636

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

637

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

638

#

639

# Only one Job with a given name may exist in a project at any

640

# given time. If a caller attempts to create a Job with the same

641

# name as an already-existing Job, the attempt returns the

642

# existing Job.

643

#

644

# The name must match the regular expression

645

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

646

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

647

#

648

# The top-level steps that constitute the entire job.

649

{ # Defines a particular step within a Cloud Dataflow job.

650

#

651

# A job consists of multiple steps, each of which performs some

652

# specific operation as part of the overall job. Data is typically

653

# passed from one step to another as part of the job.

654

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

655

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

656

# Map-Reduce job:

657

#

658

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

659

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

660

#

661

# * Validate the elements.

662

#

663

# * Apply a user-defined function to map each element to some value

664

# and extract an element-specific key value.

665

#

666

# * Group elements with the same key into a single element with

667

# that key, transforming a multiply-keyed collection into a

668

# uniquely-keyed collection.

669

#

670

# * Write the elements out to some data sink.

671

#

672

# Note that the Cloud Dataflow service may be used to run many different

673

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

674

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

675

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

676

"kind": "A String", # The kind of step in the Cloud Dataflow job.

677

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

678

# predefined step has its own required set of properties.

679

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

680

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

681

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

682

},

683

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

684

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

685

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

686

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

687

# isn't contained in the submitted job.

688

"stages": { # A mapping from each stage to the information about that stage.

689

"a_key": { # Contains information about how a particular

690

# google.dataflow.v1beta3.Step will be executed.

691

"stepName": [ # The steps associated with the execution stage.

692

# Note that stages may have several steps, and that a given step

693

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

700

#

701

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

702

# specified.

703

#

704

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

705

# terminal state. After a job has reached a terminal state, no

706

# further state updates may be made.

707

#

708

# This field may be mutated by the Cloud Dataflow service;

709

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

710

"location": "A String", # The [regional endpoint]

711

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

712

# contains this job.

713

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

714

# Flexible resource scheduling jobs are started with some delay after job

715

# creation, so start_time is unset before start and is updated when the

716

# job is started by the Cloud Dataflow service. For other jobs, start_time

717

# always equals to create_time and is immutable and set by the Cloud Dataflow

718

# service.

719

"stepsLocation": "A String", # The GCS location where the steps are stored.

720

"labels": { # User-defined labels for this job.

721

#

722

# The labels map can contain no more than 64 entries. Entries of the labels

723

# map are UTF8 strings that comply with the following restrictions:

724

#

725

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

726

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

727

# * Both keys and values are additionally constrained to be <= 128 bytes in

728

# size.

729

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

730

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

731

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

732

# Cloud Dataflow service.

733

"requestedState": "A String", # The job's requested state.

734

#

735

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

736

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

737

# also be used to directly set a job's requested state to

738

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

739

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

},

],

}</pre>

</div>

<code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>

747

<pre>Retrieves the next page of results.

748

749

Args:

750

previous_request: The request for the previous page. (required)

751

previous_response: The response from the request for the previous page. (required)

752

753

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

754

A request object that you can call 'execute()' on to request the next

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

755

page. Returns None if there are no more items in the collection.

</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

760

<code class="details" id="create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

761

<pre>Creates a Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

762

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

763

To create a job, we recommend using `projects.locations.jobs.create` with a

764

[regional endpoint]

765

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

766

`projects.jobs.create` is not recommended, as your job will always start

767

in `us-central1`.

768

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

769

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

770

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

771

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

772

The object takes the form of:

773

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

774

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

775

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

776

# If this field is set, the service will ensure its uniqueness.

777

# The request to create a job will fail if the service has knowledge of a

778

# previously submitted job with the same client's ID and job name.

779

# The caller may use this field to ensure idempotence of job

780

# creation across retried attempts to create a job.

781

# By default, the field is empty and, in that case, the service ignores it.

782

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

783

#

784

# This field is set by the Cloud Dataflow service when the Job is

785

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

786

"currentStateTime": "A String", # The timestamp associated with the current state.

787

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

788

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

789

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

790

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

791

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

792

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

793

# options are passed through the service and are used to recreate the

794

# SDK pipeline options on the worker in a language agnostic and platform

795

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

796

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

797

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

798

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

799

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

800

# specified in order for the job to have workers.

801

{ # Describes one particular pool of Cloud Dataflow workers to be

802

# instantiated by the Cloud Dataflow service in order to perform the

803

# computations required by a job. Note that a workflow job may use

804

# multiple pools, in order to match the various computational

805

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

806

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

807

# select a default set of packages which are useful to worker

808

# harnesses written in a particular language.

809

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

810

# the service will use the network "default".

811

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

812

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

813

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

814

# execute the job. If zero or unspecified, the service will

815

# attempt to choose a reasonable default.

816

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

817

# service will choose a number of threads (according to the number of cores

818

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

819

"diskSourceImage": "A String", # Fully qualified source image for disks.

820

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

821

{ # The packages that must be installed in order for a worker to run the

822

# steps of the Cloud Dataflow job that will be assigned to its worker

823

# pool.

824

#

825

# This is the mechanism by which the Cloud Dataflow SDK causes code to

826

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

827

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

828

# various dependencies (libraries, data files, etc.) required in order

829

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

830

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

831

#

832

# Google Cloud Storage:

833

#

834

# storage.googleapis.com/{bucket}

835

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

836

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

837

},

838

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

839

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

840

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

841

# `TEARDOWN_NEVER`.

842

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

843

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

844

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

845

# down.

846

#

847

# If the workers are not torn down by the service, they will

848

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

849

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

850

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

851

# policy except for small, manually supervised test jobs.

852

#

853

# If unknown or unspecified, the service will attempt to choose a reasonable

854

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

855

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

856

# Compute Engine API.

857

"poolArgs": { # Extra arguments for this worker pool.

858

"a_key": "", # Properties of the object. Contains field @type with type URL.

859

},

860

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

861

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

862

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

863

# harness, residing in Google Container Registry.

864

#

865

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

866

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

867

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

868

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

869

# service will attempt to choose a reasonable default.

870

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

871

# are supported.

872

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

873

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

874

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

875

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

876

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

877

# must be a disk type appropriate to the project and zone in which

878

# the workers will run. If unknown or unspecified, the service

879

# will attempt to choose a reasonable default.

880

#

881

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

882

# typically ending in "pd-standard". If SSD persistent disks are

883

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

884

# actual valid values are defined the Google Compute Engine API,

885

# not by the Cloud Dataflow API; consult the Google Compute Engine

886

# documentation for more information about determining the set of

887

# available disk types for a particular project and zone.

888

#

889

# Google Compute Engine Disk types are local to a particular

890

# project in a particular zone, and so the resource name will

891

# typically look something like this:

892

#

893

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

894

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

895

},

896

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

897

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

898

# only be set in the Fn API path. For non-cross-language pipelines this

899

# should have only one entry. Cross-language pipelines will have two or more

900

# entries.

901

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

902

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

903

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

904

# container instance with this image. If false (or unset) recommends using

905

# more than one core per SDK container instance with this image for

906

# efficiency. Note that Dataflow service may choose to override this property

907

# if needed.

908

},

909

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

910

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

911

# the form "regions/REGION/subnetworks/SUBNETWORK".

912

"ipConfiguration": "A String", # Configuration for VM IPs.

913

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

914

# using the standard Dataflow task runner. Users should ignore

915

# this field.

916

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

917

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

918

# taskrunner; e.g. "wheel".

919

"harnessCommand": "A String", # The command to launch the worker harness.

920

"logDir": "A String", # The directory on the VM to store logs.

921

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

922

# access the Cloud Dataflow API.

923

"A String",

924

],

925

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

926

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

927

# will not be uploaded.

928

#

929

# The supported resource type is:

930

#

931

# Google Cloud Storage:

932

# storage.googleapis.com/{bucket}/{object}

933

# bucket.storage.googleapis.com/{object}

934

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

935

"workflowFileName": "A String", # The file to store the workflow in.

936

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

937

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

938

# temporary storage.

939

#

940

# The supported resource type is:

941

#

942

# Google Cloud Storage:

943

# storage.googleapis.com/{bucket}/{object}

944

# bucket.storage.googleapis.com/{object}

945

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

946

"languageHint": "A String", # The suggested backend language.

947

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

948

#

949

# When workers access Google Cloud APIs, they logically do so via

950

# relative URLs. If this field is specified, it supplies the base

951

# URL to use for resolving these relative URLs. The normative

952

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

953

# Locators".

954

#

955

# If not specified, the default value is "http://www.googleapis.com/"

956

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

957

# console.

958

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

959

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

960

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

961

#

962

# When workers access Google Cloud APIs, they logically do so via

963

# relative URLs. If this field is specified, it supplies the base

964

# URL to use for resolving these relative URLs. The normative

965

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

966

# Locators".

967

#

968

# If not specified, the default value is "http://www.googleapis.com/"

969

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

970

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

971

# "dataflow/v1b3/projects".

972

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

973

# "shuffle/v1beta1".

974

"workerId": "A String", # The ID of the worker running this pipeline.

975

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

976

# storage.

977

#

978

# The supported resource type is:

979

#

980

# Google Cloud Storage:

981

#

982

# storage.googleapis.com/{bucket}/{object}

983

# bucket.storage.googleapis.com/{object}

984

},

985

"vmId": "A String", # The ID string of the VM.

986

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

987

# taskrunner; e.g. "root".

988

},

989

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

990

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

991

"algorithm": "A String", # The algorithm to use for autoscaling.

992

},

993

"metadata": { # Metadata to set on the Google Compute Engine VMs.

994

"a_key": "A String",

995

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

996

},

997

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

998

"dataset": "A String", # The dataset for the current project where various workflow

999

# related tables are stored.

1000

#

1001

# The supported resource type is:

1002

#

1003

# Google BigQuery:

1004

# bigquery.googleapis.com/{dataset}

1005

"internalExperiments": { # Experimental settings.

1006

"a_key": "", # Properties of the object. Contains field @type with type URL.

1007

},

1008

"workerRegion": "A String", # The Compute Engine region

1009

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1010

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1011

# with worker_zone. If neither worker_region nor worker_zone is specified,

1012

# default to the control plane's region.

1013

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1014

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1015

#

1016

# Format:

1017

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1018

"userAgent": { # A description of the process that generated the request.

1019

"a_key": "", # Properties of the object.

1020

},

1021

"workerZone": "A String", # The Compute Engine zone

1022

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1023

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1024

# with worker_region. If neither worker_region nor worker_zone is specified,

1025

# a zone in the control plane's region is chosen based on available capacity.

1026

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1027

# unspecified, the service will attempt to choose a reasonable

1028

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1029

# e.g. "compute.googleapis.com".

1030

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1031

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1032

# this resource prefix, where {JOBNAME} is the value of the

1033

# job_name field. The resulting bucket and object prefix is used

1034

# as the prefix of the resources used to store temporary data

1035

# needed during the job execution. NOTE: This will override the

1036

# value in taskrunner_settings.

1037

# The supported resource type is:

1038

#

1039

# Google Cloud Storage:

1040

#

1041

# storage.googleapis.com/{bucket}/{object}

1042

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1043

"experiments": [ # The list of experiments to enable.

1044

"A String",

1045

],

1046

"version": { # A structure describing which components and their versions of the service

1047

# are required in order to run the job.

1048

"a_key": "", # Properties of the object.

1049

},

1050

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1051

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1052

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1053

# callers cannot mutate it.

1054

{ # A message describing the state of a particular execution stage.

1055

"executionStageName": "A String", # The name of the execution stage.

1056

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1057

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1058

},

1059

],

1060

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1061

# by the metadata values provided here. Populated for ListJobs and all GetJob

1062

# views SUMMARY and higher.

1063

# ListJob response and Job SUMMARY view.

1064

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1065

{ # Metadata for a BigTable connector used by the job.

1066

"tableId": "A String", # TableId accessed in the connection.

1067

"projectId": "A String", # ProjectId accessed in the connection.

1068

"instanceId": "A String", # InstanceId accessed in the connection.

1069

},

1070

],

1071

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1072

{ # Metadata for a Spanner connector used by the job.

1073

"databaseId": "A String", # DatabaseId accessed in the connection.

1074

"instanceId": "A String", # InstanceId accessed in the connection.

1075

"projectId": "A String", # ProjectId accessed in the connection.

1076

},

1077

],

1078

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1079

{ # Metadata for a Datastore connector used by the job.

1080

"projectId": "A String", # ProjectId accessed in the connection.

1081

"namespace": "A String", # Namespace used in the connection.

1082

},

1083

],

1084

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1085

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1086

"sdkSupportStatus": "A String", # The support status for this SDK version.

1087

"version": "A String", # The version of the SDK used to run the job.

1088

},

1089

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1090

{ # Metadata for a BigQuery connector used by the job.

1091

"table": "A String", # Table accessed in the connection.

1092

"dataset": "A String", # Dataset accessed in the connection.

1093

"projectId": "A String", # Project accessed in the connection.

1094

"query": "A String", # Query used to access data in the connection.

1095

},

1096

],

1097

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1098

{ # Metadata for a File connector used by the job.

1099

"filePattern": "A String", # File Pattern used to access files by the connector.

1100

},

1101

],

1102

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1103

{ # Metadata for a PubSub connector used by the job.

1104

"subscription": "A String", # Subscription used in the connection.

1105

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1110

# snapshot.

1111

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1112

"type": "A String", # The type of Cloud Dataflow job.

1113

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1114

# A description of the user pipeline and stages through which it is executed.

1115

# Created by Cloud Dataflow service. Only retrieved with

1116

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1117

# form. This data is provided by the Dataflow service for ease of visualizing

1118

# the pipeline and interpreting Dataflow provided metrics.

1119

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1120

{ # Description of the composing transforms, names/ids, and input/outputs of a

1121

# stage of execution. Some composing transforms and sources may have been

1122

# generated by the Dataflow service during execution planning.

1123

"id": "A String", # Dataflow service generated id for this stage.

1124

"componentTransform": [ # Transforms that comprise this execution stage.

1125

{ # Description of a transform executed as part of an execution stage.

1126

"originalTransform": "A String", # User name for the original user transform with which this transform is

1127

# most closely associated.

1128

"name": "A String", # Dataflow service generated name for this source.

1129

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1130

},

1131

],

1132

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1133

{ # Description of an interstitial value between transforms in an execution

1134

# stage.

1135

"name": "A String", # Dataflow service generated name for this source.

1136

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1137

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1138

# source is most closely associated.

1139

},

1140

],

1141

"kind": "A String", # Type of tranform this stage is executing.

1142

"outputSource": [ # Output sources for this stage.

1143

{ # Description of an input or output of an execution stage.

1144

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1145

# source is most closely associated.

1146

"name": "A String", # Dataflow service generated name for this source.

1147

"sizeBytes": "A String", # Size of the source, if measurable.

1148

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1149

},

1150

],

1151

"name": "A String", # Dataflow service generated name for this stage.

1152

"inputSource": [ # Input sources for this stage.

1153

{ # Description of an input or output of an execution stage.

1154

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1155

# source is most closely associated.

1156

"name": "A String", # Dataflow service generated name for this source.

1157

"sizeBytes": "A String", # Size of the source, if measurable.

1158

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1164

{ # Description of the type, names/ids, and input/outputs for a transform.

1165

"kind": "A String", # Type of transform.

1166

"inputCollectionName": [ # User names for all collection inputs to this transform.

1167

"A String",

1168

],

1169

"name": "A String", # User provided name for this transform instance.

1170

"id": "A String", # SDK generated id of this transform instance.

1171

"displayData": [ # Transform-specific display data.

1172

{ # Data provided with a pipeline or transform to provide descriptive info.

1173

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1174

"boolValue": True or False, # Contains value if the data is of a boolean type.

1175

"javaClassValue": "A String", # Contains value if the data is of java class type.

1176

"strValue": "A String", # Contains value if the data is of string type.

1177

"int64Value": "A String", # Contains value if the data is of int64 type.

1178

"durationValue": "A String", # Contains value if the data is of duration type.

1179

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1180

# language namespace (i.e. python module) which defines the display data.

1181

# This allows a dax monitoring system to specially handle the data

1182

# and perform custom rendering.

1183

"floatValue": 3.14, # Contains value if the data is of float type.

1184

"key": "A String", # The key identifying the display data.

1185

# This is intended to be used as a label for the display data

1186

# when viewed in a dax monitoring system.

1187

"shortStrValue": "A String", # A possible additional shorter value to display.

1188

# For example a java_class_name_value of com.mypackage.MyDoFn

1189

# will be stored with MyDoFn as the short_str_value and

1190

# com.mypackage.MyDoFn as the java_class_name value.

1191

# short_str_value can be displayed and java_class_name_value

1192

# will be displayed as a tooltip.

1193

"url": "A String", # An optional full URL.

1194

"label": "A String", # An optional label to display in a dax UI for the element.

1195

},

1196

],

1197

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1203

{ # Data provided with a pipeline or transform to provide descriptive info.

1204

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1205

"boolValue": True or False, # Contains value if the data is of a boolean type.

1206

"javaClassValue": "A String", # Contains value if the data is of java class type.

1207

"strValue": "A String", # Contains value if the data is of string type.

1208

"int64Value": "A String", # Contains value if the data is of int64 type.

1209

"durationValue": "A String", # Contains value if the data is of duration type.

1210

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1211

# language namespace (i.e. python module) which defines the display data.

1212

# This allows a dax monitoring system to specially handle the data

1213

# and perform custom rendering.

1214

"floatValue": 3.14, # Contains value if the data is of float type.

1215

"key": "A String", # The key identifying the display data.

1216

# This is intended to be used as a label for the display data

1217

# when viewed in a dax monitoring system.

1218

"shortStrValue": "A String", # A possible additional shorter value to display.

1219

# For example a java_class_name_value of com.mypackage.MyDoFn

1220

# will be stored with MyDoFn as the short_str_value and

1221

# com.mypackage.MyDoFn as the java_class_name value.

1222

# short_str_value can be displayed and java_class_name_value

1223

# will be displayed as a tooltip.

1224

"url": "A String", # An optional full URL.

1225

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1230

# of the job it replaced.

1231

#

1232

# When sending a `CreateJobRequest`, you can update a job by specifying it

1233

# here. The job named here is stopped, and its intermediate state is

1234

# transferred to this job.

1235

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1236

# for temporary storage. These temporary files will be

1237

# removed on job completion.

1238

# No duplicates are allowed.

1239

# No file patterns are supported.

1240

#

1241

# The supported files are:

1242

#

1243

# Google Cloud Storage:

1244

#

1245

# storage.googleapis.com/{bucket}/{object}

1246

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1247

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1248

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1249

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1250

#

1251

# Only one Job with a given name may exist in a project at any

1252

# given time. If a caller attempts to create a Job with the same

1253

# name as an already-existing Job, the attempt returns the

1254

# existing Job.

1255

#

1256

# The name must match the regular expression

1257

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1258

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1259

#

1260

# The top-level steps that constitute the entire job.

1261

{ # Defines a particular step within a Cloud Dataflow job.

1262

#

1263

# A job consists of multiple steps, each of which performs some

1264

# specific operation as part of the overall job. Data is typically

1265

# passed from one step to another as part of the job.

1266

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1267

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1268

# Map-Reduce job:

1269

#

1270

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1271

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1272

#

1273

# * Validate the elements.

1274

#

1275

# * Apply a user-defined function to map each element to some value

1276

# and extract an element-specific key value.

1277

#

1278

# * Group elements with the same key into a single element with

1279

# that key, transforming a multiply-keyed collection into a

1280

# uniquely-keyed collection.

1281

#

1282

# * Write the elements out to some data sink.

1283

#

1284

# Note that the Cloud Dataflow service may be used to run many different

1285

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1286

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1287

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1288

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1289

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1290

# predefined step has its own required set of properties.

1291

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1292

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1293

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1294

},

1295

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1296

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1297

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1298

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1299

# isn't contained in the submitted job.

1300

"stages": { # A mapping from each stage to the information about that stage.

1301

"a_key": { # Contains information about how a particular

1302

# google.dataflow.v1beta3.Step will be executed.

1303

"stepName": [ # The steps associated with the execution stage.

1304

# Note that stages may have several steps, and that a given step

1305

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1312

#

1313

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1314

# specified.

1315

#

1316

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1317

# terminal state. After a job has reached a terminal state, no

1318

# further state updates may be made.

1319

#

1320

# This field may be mutated by the Cloud Dataflow service;

1321

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1322

"location": "A String", # The [regional endpoint]

1323

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1324

# contains this job.

1325

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1326

# Flexible resource scheduling jobs are started with some delay after job

1327

# creation, so start_time is unset before start and is updated when the

1328

# job is started by the Cloud Dataflow service. For other jobs, start_time

1329

# always equals to create_time and is immutable and set by the Cloud Dataflow

1330

# service.

1331

"stepsLocation": "A String", # The GCS location where the steps are stored.

1332

"labels": { # User-defined labels for this job.

1333

#

1334

# The labels map can contain no more than 64 entries. Entries of the labels

1335

# map are UTF8 strings that comply with the following restrictions:

1336

#

1337

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1338

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1339

# * Both keys and values are additionally constrained to be <= 128 bytes in

1340

# size.

1341

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1342

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1343

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1344

# Cloud Dataflow service.

1345

"requestedState": "A String", # The job's requested state.

1346

#

1347

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1348

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1349

# also be used to directly set a job's requested state to

1350

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1351

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1352

}

1353

1354

location: string, The [regional endpoint]

1355

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1356

contains this job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1357

replaceJobId: string, Deprecated. This field is now in the Job message.

1358

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1359

x__xgafv: string, V1 error format.

1360

Allowed values

1361

1 - v1 error format

1362

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1363

1364

Returns:

1365

An object of the form:

1366

1367

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1368

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1369

# If this field is set, the service will ensure its uniqueness.

1370

# The request to create a job will fail if the service has knowledge of a

1371

# previously submitted job with the same client's ID and job name.

1372

# The caller may use this field to ensure idempotence of job

1373

# creation across retried attempts to create a job.

1374

# By default, the field is empty and, in that case, the service ignores it.

1375

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1376

#

1377

# This field is set by the Cloud Dataflow service when the Job is

1378

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1379

"currentStateTime": "A String", # The timestamp associated with the current state.

1380

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1381

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1382

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

1383

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1384

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

1385

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1386

# options are passed through the service and are used to recreate the

1387

# SDK pipeline options on the worker in a language agnostic and platform

1388

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1389

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1390

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1391

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1392

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1393

# specified in order for the job to have workers.

1394

{ # Describes one particular pool of Cloud Dataflow workers to be

1395

# instantiated by the Cloud Dataflow service in order to perform the

1396

# computations required by a job. Note that a workflow job may use

1397

# multiple pools, in order to match the various computational

1398

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1399

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1400

# select a default set of packages which are useful to worker

1401

# harnesses written in a particular language.

1402

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1403

# the service will use the network "default".

1404

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1405

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1406

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1407

# execute the job. If zero or unspecified, the service will

1408

# attempt to choose a reasonable default.

1409

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1410

# service will choose a number of threads (according to the number of cores

1411

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1412

"diskSourceImage": "A String", # Fully qualified source image for disks.

1413

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1414

{ # The packages that must be installed in order for a worker to run the

1415

# steps of the Cloud Dataflow job that will be assigned to its worker

1416

# pool.

1417

#

1418

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1419

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1420

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1421

# various dependencies (libraries, data files, etc.) required in order

1422

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1423

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1424

#

1425

# Google Cloud Storage:

1426

#

1427

# storage.googleapis.com/{bucket}

1428

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1429

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1430

},

1431

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1432

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1433

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1434

# `TEARDOWN_NEVER`.

1435

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1436

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1437

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1438

# down.

1439

#

1440

# If the workers are not torn down by the service, they will

1441

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1442

# user's project until they are explicitly terminated by the user.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1443

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1444

# policy except for small, manually supervised test jobs.

1445

#

1446

# If unknown or unspecified, the service will attempt to choose a reasonable

1447

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1448

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1449

# Compute Engine API.

1450

"poolArgs": { # Extra arguments for this worker pool.

1451

"a_key": "", # Properties of the object. Contains field @type with type URL.

1452

},

1453

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1454

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1455

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1456

# harness, residing in Google Container Registry.

1457

#

1458

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1459

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1460

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1461

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1462

# service will attempt to choose a reasonable default.

1463

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1464

# are supported.

1465

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1466

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1467

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1468

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1469

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1470

# must be a disk type appropriate to the project and zone in which

1471

# the workers will run. If unknown or unspecified, the service

1472

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1473

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1474

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1475

# typically ending in "pd-standard". If SSD persistent disks are

1476

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1477

# actual valid values are defined the Google Compute Engine API,

1478

# not by the Cloud Dataflow API; consult the Google Compute Engine

1479

# documentation for more information about determining the set of

1480

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1481

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1482

# Google Compute Engine Disk types are local to a particular

1483

# project in a particular zone, and so the resource name will

1484

# typically look something like this:

1485

#

1486

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1487

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1488

},

1489

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1490

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1491

# only be set in the Fn API path. For non-cross-language pipelines this

1492

# should have only one entry. Cross-language pipelines will have two or more

1493

# entries.

1494

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1495

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1496

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1497

# container instance with this image. If false (or unset) recommends using

1498

# more than one core per SDK container instance with this image for

1499

# efficiency. Note that Dataflow service may choose to override this property

1500

# if needed.

1501

},

1502

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1503

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1504

# the form "regions/REGION/subnetworks/SUBNETWORK".

1505

"ipConfiguration": "A String", # Configuration for VM IPs.

1506

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1507

# using the standard Dataflow task runner. Users should ignore

1508

# this field.

1509

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1510

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1511

# taskrunner; e.g. "wheel".

1512

"harnessCommand": "A String", # The command to launch the worker harness.

1513

"logDir": "A String", # The directory on the VM to store logs.

1514

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1515

# access the Cloud Dataflow API.

1516

"A String",

1517

],

1518

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1519

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1520

# will not be uploaded.

1521

#

1522

# The supported resource type is:

1523

#

1524

# Google Cloud Storage:

1525

# storage.googleapis.com/{bucket}/{object}

1526

# bucket.storage.googleapis.com/{object}

1527

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1528

"workflowFileName": "A String", # The file to store the workflow in.

1529

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1530

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1531

# temporary storage.

1532

#

1533

# The supported resource type is:

1534

#

1535

# Google Cloud Storage:

1536

# storage.googleapis.com/{bucket}/{object}

1537

# bucket.storage.googleapis.com/{object}

1538

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1539

"languageHint": "A String", # The suggested backend language.

1540

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1541

#

1542

# When workers access Google Cloud APIs, they logically do so via

1543

# relative URLs. If this field is specified, it supplies the base

1544

# URL to use for resolving these relative URLs. The normative

1545

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1546

# Locators".

1547

#

1548

# If not specified, the default value is "http://www.googleapis.com/"

1549

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1550

# console.

1551

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1552

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1553

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1554

#

1555

# When workers access Google Cloud APIs, they logically do so via

1556

# relative URLs. If this field is specified, it supplies the base

1557

# URL to use for resolving these relative URLs. The normative

1558

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1559

# Locators".

1560

#

1561

# If not specified, the default value is "http://www.googleapis.com/"

1562

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1563

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1564

# "dataflow/v1b3/projects".

1565

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1566

# "shuffle/v1beta1".

1567

"workerId": "A String", # The ID of the worker running this pipeline.

1568

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1569

# storage.

1570

#

1571

# The supported resource type is:

1572

#

1573

# Google Cloud Storage:

1574

#

1575

# storage.googleapis.com/{bucket}/{object}

1576

# bucket.storage.googleapis.com/{object}

1577

},

1578

"vmId": "A String", # The ID string of the VM.

1579

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1580

# taskrunner; e.g. "root".

1581

},

1582

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1583

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1584

"algorithm": "A String", # The algorithm to use for autoscaling.

1585

},

1586

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1587

"a_key": "A String",

1588

},

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

1589

},

1590

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1591

"dataset": "A String", # The dataset for the current project where various workflow

1592

# related tables are stored.

1593

#

1594

# The supported resource type is:

1595

#

1596

# Google BigQuery:

1597

# bigquery.googleapis.com/{dataset}

1598

"internalExperiments": { # Experimental settings.

1599

"a_key": "", # Properties of the object. Contains field @type with type URL.

1600

},

1601

"workerRegion": "A String", # The Compute Engine region

1602

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1603

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1604

# with worker_zone. If neither worker_region nor worker_zone is specified,

1605

# default to the control plane's region.

1606

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1607

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1608

#

1609

# Format:

1610

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1611

"userAgent": { # A description of the process that generated the request.

1612

"a_key": "", # Properties of the object.

1613

},

1614

"workerZone": "A String", # The Compute Engine zone

1615

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1616

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1617

# with worker_region. If neither worker_region nor worker_zone is specified,

1618

# a zone in the control plane's region is chosen based on available capacity.

1619

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1620

# unspecified, the service will attempt to choose a reasonable

1621

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1622

# e.g. "compute.googleapis.com".

1623

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1624

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1625

# this resource prefix, where {JOBNAME} is the value of the

1626

# job_name field. The resulting bucket and object prefix is used

1627

# as the prefix of the resources used to store temporary data

1628

# needed during the job execution. NOTE: This will override the

1629

# value in taskrunner_settings.

1630

# The supported resource type is:

1631

#

1632

# Google Cloud Storage:

1633

#

1634

# storage.googleapis.com/{bucket}/{object}

1635

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1636

"experiments": [ # The list of experiments to enable.

1637

"A String",

1638

],

1639

"version": { # A structure describing which components and their versions of the service

1640

# are required in order to run the job.

1641

"a_key": "", # Properties of the object.

1642

},

1643

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1644

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1645

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1646

# callers cannot mutate it.

1647

{ # A message describing the state of a particular execution stage.

1648

"executionStageName": "A String", # The name of the execution stage.

1649

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1650

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1651

},

1652

],

1653

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1654

# by the metadata values provided here. Populated for ListJobs and all GetJob

1655

# views SUMMARY and higher.

1656

# ListJob response and Job SUMMARY view.

1657

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1658

{ # Metadata for a BigTable connector used by the job.

1659

"tableId": "A String", # TableId accessed in the connection.

1660

"projectId": "A String", # ProjectId accessed in the connection.

1661

"instanceId": "A String", # InstanceId accessed in the connection.

1662

},

1663

],

1664

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1665

{ # Metadata for a Spanner connector used by the job.

1666

"databaseId": "A String", # DatabaseId accessed in the connection.

1667

"instanceId": "A String", # InstanceId accessed in the connection.

1668

"projectId": "A String", # ProjectId accessed in the connection.

1669

},

1670

],

1671

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1672

{ # Metadata for a Datastore connector used by the job.

1673

"projectId": "A String", # ProjectId accessed in the connection.

1674

"namespace": "A String", # Namespace used in the connection.

1675

},

1676

],

1677

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1678

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1679

"sdkSupportStatus": "A String", # The support status for this SDK version.

1680

"version": "A String", # The version of the SDK used to run the job.

1681

},

1682

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1683

{ # Metadata for a BigQuery connector used by the job.

1684

"table": "A String", # Table accessed in the connection.

1685

"dataset": "A String", # Dataset accessed in the connection.

1686

"projectId": "A String", # Project accessed in the connection.

1687

"query": "A String", # Query used to access data in the connection.

1688

},

1689

],

1690

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1691

{ # Metadata for a File connector used by the job.

1692

"filePattern": "A String", # File Pattern used to access files by the connector.

1693

},

1694

],

1695

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1696

{ # Metadata for a PubSub connector used by the job.

1697

"subscription": "A String", # Subscription used in the connection.

1698

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1703

# snapshot.

1704

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1705

"type": "A String", # The type of Cloud Dataflow job.

1706

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1707

# A description of the user pipeline and stages through which it is executed.

1708

# Created by Cloud Dataflow service. Only retrieved with

1709

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1710

# form. This data is provided by the Dataflow service for ease of visualizing

1711

# the pipeline and interpreting Dataflow provided metrics.

1712

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1713

{ # Description of the composing transforms, names/ids, and input/outputs of a

1714

# stage of execution. Some composing transforms and sources may have been

1715

# generated by the Dataflow service during execution planning.

1716

"id": "A String", # Dataflow service generated id for this stage.

1717

"componentTransform": [ # Transforms that comprise this execution stage.

1718

{ # Description of a transform executed as part of an execution stage.

1719

"originalTransform": "A String", # User name for the original user transform with which this transform is

1720

# most closely associated.

1721

"name": "A String", # Dataflow service generated name for this source.

1722

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1723

},

1724

],

1725

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1726

{ # Description of an interstitial value between transforms in an execution

1727

# stage.

1728

"name": "A String", # Dataflow service generated name for this source.

1729

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1730

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1731

# source is most closely associated.

1732

},

1733

],

1734

"kind": "A String", # Type of tranform this stage is executing.

1735

"outputSource": [ # Output sources for this stage.

1736

{ # Description of an input or output of an execution stage.

1737

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1738

# source is most closely associated.

1739

"name": "A String", # Dataflow service generated name for this source.

1740

"sizeBytes": "A String", # Size of the source, if measurable.

1741

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1742

},

1743

],

1744

"name": "A String", # Dataflow service generated name for this stage.

1745

"inputSource": [ # Input sources for this stage.

1746

{ # Description of an input or output of an execution stage.

1747

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1748

# source is most closely associated.

1749

"name": "A String", # Dataflow service generated name for this source.

1750

"sizeBytes": "A String", # Size of the source, if measurable.

1751

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1757

{ # Description of the type, names/ids, and input/outputs for a transform.

1758

"kind": "A String", # Type of transform.

1759

"inputCollectionName": [ # User names for all collection inputs to this transform.

1760

"A String",

1761

],

1762

"name": "A String", # User provided name for this transform instance.

1763

"id": "A String", # SDK generated id of this transform instance.

1764

"displayData": [ # Transform-specific display data.

1765

{ # Data provided with a pipeline or transform to provide descriptive info.

1766

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1767

"boolValue": True or False, # Contains value if the data is of a boolean type.

1768

"javaClassValue": "A String", # Contains value if the data is of java class type.

1769

"strValue": "A String", # Contains value if the data is of string type.

1770

"int64Value": "A String", # Contains value if the data is of int64 type.

1771

"durationValue": "A String", # Contains value if the data is of duration type.

1772

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1773

# language namespace (i.e. python module) which defines the display data.

1774

# This allows a dax monitoring system to specially handle the data

1775

# and perform custom rendering.

1776

"floatValue": 3.14, # Contains value if the data is of float type.

1777

"key": "A String", # The key identifying the display data.

1778

# This is intended to be used as a label for the display data

1779

# when viewed in a dax monitoring system.

1780

"shortStrValue": "A String", # A possible additional shorter value to display.

1781

# For example a java_class_name_value of com.mypackage.MyDoFn

1782

# will be stored with MyDoFn as the short_str_value and

1783

# com.mypackage.MyDoFn as the java_class_name value.

1784

# short_str_value can be displayed and java_class_name_value

1785

# will be displayed as a tooltip.

1786

"url": "A String", # An optional full URL.

1787

"label": "A String", # An optional label to display in a dax UI for the element.

1788

},

1789

],

1790

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1796

{ # Data provided with a pipeline or transform to provide descriptive info.

1797

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1798

"boolValue": True or False, # Contains value if the data is of a boolean type.

1799

"javaClassValue": "A String", # Contains value if the data is of java class type.

1800

"strValue": "A String", # Contains value if the data is of string type.

1801

"int64Value": "A String", # Contains value if the data is of int64 type.

1802

"durationValue": "A String", # Contains value if the data is of duration type.

1803

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1804

# language namespace (i.e. python module) which defines the display data.

1805

# This allows a dax monitoring system to specially handle the data

1806

# and perform custom rendering.

1807

"floatValue": 3.14, # Contains value if the data is of float type.

1808

"key": "A String", # The key identifying the display data.

1809

# This is intended to be used as a label for the display data

1810

# when viewed in a dax monitoring system.

1811

"shortStrValue": "A String", # A possible additional shorter value to display.

1812

# For example a java_class_name_value of com.mypackage.MyDoFn

1813

# will be stored with MyDoFn as the short_str_value and

1814

# com.mypackage.MyDoFn as the java_class_name value.

1815

# short_str_value can be displayed and java_class_name_value

1816

# will be displayed as a tooltip.

1817

"url": "A String", # An optional full URL.

1818

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1823

# of the job it replaced.

1824

#

1825

# When sending a `CreateJobRequest`, you can update a job by specifying it

1826

# here. The job named here is stopped, and its intermediate state is

1827

# transferred to this job.

1828

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1829

# for temporary storage. These temporary files will be

1830

# removed on job completion.

1831

# No duplicates are allowed.

1832

# No file patterns are supported.

1833

#

1834

# The supported files are:

1835

#

1836

# Google Cloud Storage:

1837

#

1838

# storage.googleapis.com/{bucket}/{object}

1839

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1840

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1841

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1842

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1843

#

1844

# Only one Job with a given name may exist in a project at any

1845

# given time. If a caller attempts to create a Job with the same

1846

# name as an already-existing Job, the attempt returns the

1847

# existing Job.

1848

#

1849

# The name must match the regular expression

1850

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1851

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1852

#

1853

# The top-level steps that constitute the entire job.

1854

{ # Defines a particular step within a Cloud Dataflow job.

1855

#

1856

# A job consists of multiple steps, each of which performs some

1857

# specific operation as part of the overall job. Data is typically

1858

# passed from one step to another as part of the job.

1859

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1860

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1861

# Map-Reduce job:

1862

#

1863

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1864

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1865

#

1866

# * Validate the elements.

1867

#

1868

# * Apply a user-defined function to map each element to some value

1869

# and extract an element-specific key value.

1870

#

1871

# * Group elements with the same key into a single element with

1872

# that key, transforming a multiply-keyed collection into a

1873

# uniquely-keyed collection.

1874

#

1875

# * Write the elements out to some data sink.

1876

#

1877

# Note that the Cloud Dataflow service may be used to run many different

1878

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1879

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1880

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1881

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1882

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1883

# predefined step has its own required set of properties.

1884

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1885

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1886

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1887

},

1888

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1889

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1890

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1891

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1892

# isn't contained in the submitted job.

1893

"stages": { # A mapping from each stage to the information about that stage.

1894

"a_key": { # Contains information about how a particular

1895

# google.dataflow.v1beta3.Step will be executed.

1896

"stepName": [ # The steps associated with the execution stage.

1897

# Note that stages may have several steps, and that a given step

1898

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1905

#

1906

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1907

# specified.

1908

#

1909

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1910

# terminal state. After a job has reached a terminal state, no

1911

# further state updates may be made.

1912

#

1913

# This field may be mutated by the Cloud Dataflow service;

1914

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1915

"location": "A String", # The [regional endpoint]

1916

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1917

# contains this job.

1918

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1919

# Flexible resource scheduling jobs are started with some delay after job

1920

# creation, so start_time is unset before start and is updated when the

1921

# job is started by the Cloud Dataflow service. For other jobs, start_time

1922

# always equals to create_time and is immutable and set by the Cloud Dataflow

1923

# service.

1924

"stepsLocation": "A String", # The GCS location where the steps are stored.

1925

"labels": { # User-defined labels for this job.

1926

#

1927

# The labels map can contain no more than 64 entries. Entries of the labels

1928

# map are UTF8 strings that comply with the following restrictions:

1929

#

1930

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1931

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1932

# * Both keys and values are additionally constrained to be <= 128 bytes in

1933

# size.

1934

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1935

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1936

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1937

# Cloud Dataflow service.

1938

"requestedState": "A String", # The job's requested state.

1939

#

1940

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1941

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1942

# also be used to directly set a job's requested state to

1943

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1944

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1949

<code class="details" id="get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1950

<pre>Gets the state of the specified Cloud Dataflow job.

1951

1952

To get the state of a job, we recommend using `projects.locations.jobs.get`

1953

with a [regional endpoint]

1954

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1955

`projects.jobs.get` is not recommended, as you can only get the state of

1956

jobs that are running in `us-central1`.

1957

1958

Args:

1959

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1960

jobId: string, The job ID. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1961

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1962

location: string, The [regional endpoint]

1963

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1964

contains this job.

1965

x__xgafv: string, V1 error format.

1966

Allowed values

1967

1 - v1 error format

1968

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1969

1970

Returns:

1971

An object of the form:

1972

1973

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1974

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1975

# If this field is set, the service will ensure its uniqueness.

1976

# The request to create a job will fail if the service has knowledge of a

1977

# previously submitted job with the same client's ID and job name.

1978

# The caller may use this field to ensure idempotence of job

1979

# creation across retried attempts to create a job.

1980

# By default, the field is empty and, in that case, the service ignores it.

1981

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1982

#

1983

# This field is set by the Cloud Dataflow service when the Job is

1984

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1985

"currentStateTime": "A String", # The timestamp associated with the current state.

1986

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1987

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1988

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1989

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1990

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

1991

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1992

# options are passed through the service and are used to recreate the

1993

# SDK pipeline options on the worker in a language agnostic and platform

1994

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1995

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1996

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1997

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1998

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1999

# specified in order for the job to have workers.

2000

{ # Describes one particular pool of Cloud Dataflow workers to be

2001

# instantiated by the Cloud Dataflow service in order to perform the

2002

# computations required by a job. Note that a workflow job may use

2003

# multiple pools, in order to match the various computational

2004

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2005

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2006

# select a default set of packages which are useful to worker

2007

# harnesses written in a particular language.

2008

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2009

# the service will use the network "default".

2010

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2011

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2012

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2013

# execute the job. If zero or unspecified, the service will

2014

# attempt to choose a reasonable default.

2015

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2016

# service will choose a number of threads (according to the number of cores

2017

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2018

"diskSourceImage": "A String", # Fully qualified source image for disks.

2019

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2020

{ # The packages that must be installed in order for a worker to run the

2021

# steps of the Cloud Dataflow job that will be assigned to its worker

2022

# pool.

2023

#

2024

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2025

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2026

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2027

# various dependencies (libraries, data files, etc.) required in order

2028

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2029

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2030

#

2031

# Google Cloud Storage:

2032

#

2033

# storage.googleapis.com/{bucket}

2034

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2035

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2036

},

2037

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2038

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2039

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2040

# `TEARDOWN_NEVER`.

2041

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2042

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2043

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2044

# down.

2045

#

2046

# If the workers are not torn down by the service, they will

2047

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2048

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2049

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2050

# policy except for small, manually supervised test jobs.

2051

#

2052

# If unknown or unspecified, the service will attempt to choose a reasonable

2053

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2054

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2055

# Compute Engine API.

2056

"poolArgs": { # Extra arguments for this worker pool.

2057

"a_key": "", # Properties of the object. Contains field @type with type URL.

2058

},

2059

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2060

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2061

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2062

# harness, residing in Google Container Registry.

2063

#

2064

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2065

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2066

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2067

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2068

# service will attempt to choose a reasonable default.

2069

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2070

# are supported.

2071

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2072

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2073

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2074

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2075

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2076

# must be a disk type appropriate to the project and zone in which

2077

# the workers will run. If unknown or unspecified, the service

2078

# will attempt to choose a reasonable default.

2079

#

2080

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2081

# typically ending in "pd-standard". If SSD persistent disks are

2082

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2083

# actual valid values are defined the Google Compute Engine API,

2084

# not by the Cloud Dataflow API; consult the Google Compute Engine

2085

# documentation for more information about determining the set of

2086

# available disk types for a particular project and zone.

2087

#

2088

# Google Compute Engine Disk types are local to a particular

2089

# project in a particular zone, and so the resource name will

2090

# typically look something like this:

2091

#

2092

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2093

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2094

},

2095

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2096

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2097

# only be set in the Fn API path. For non-cross-language pipelines this

2098

# should have only one entry. Cross-language pipelines will have two or more

2099

# entries.

2100

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2101

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2102

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2103

# container instance with this image. If false (or unset) recommends using

2104

# more than one core per SDK container instance with this image for

2105

# efficiency. Note that Dataflow service may choose to override this property

2106

# if needed.

2107

},

2108

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2109

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2110

# the form "regions/REGION/subnetworks/SUBNETWORK".

2111

"ipConfiguration": "A String", # Configuration for VM IPs.

2112

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2113

# using the standard Dataflow task runner. Users should ignore

2114

# this field.

2115

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2116

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2117

# taskrunner; e.g. "wheel".

2118

"harnessCommand": "A String", # The command to launch the worker harness.

2119

"logDir": "A String", # The directory on the VM to store logs.

2120

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2121

# access the Cloud Dataflow API.

2122

"A String",

2123

],

2124

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2125

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2126

# will not be uploaded.

2127

#

2128

# The supported resource type is:

2129

#

2130

# Google Cloud Storage:

2131

# storage.googleapis.com/{bucket}/{object}

2132

# bucket.storage.googleapis.com/{object}

2133

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2134

"workflowFileName": "A String", # The file to store the workflow in.

2135

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2136

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2137

# temporary storage.

2138

#

2139

# The supported resource type is:

2140

#

2141

# Google Cloud Storage:

2142

# storage.googleapis.com/{bucket}/{object}

2143

# bucket.storage.googleapis.com/{object}

2144

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2145

"languageHint": "A String", # The suggested backend language.

2146

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2147

#

2148

# When workers access Google Cloud APIs, they logically do so via

2149

# relative URLs. If this field is specified, it supplies the base

2150

# URL to use for resolving these relative URLs. The normative

2151

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2152

# Locators".

2153

#

2154

# If not specified, the default value is "http://www.googleapis.com/"

2155

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2156

# console.

2157

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2158

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2159

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2160

#

2161

# When workers access Google Cloud APIs, they logically do so via

2162

# relative URLs. If this field is specified, it supplies the base

2163

# URL to use for resolving these relative URLs. The normative

2164

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2165

# Locators".

2166

#

2167

# If not specified, the default value is "http://www.googleapis.com/"

2168

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2169

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2170

# "dataflow/v1b3/projects".

2171

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2172

# "shuffle/v1beta1".

2173

"workerId": "A String", # The ID of the worker running this pipeline.

2174

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2175

# storage.

2176

#

2177

# The supported resource type is:

2178

#

2179

# Google Cloud Storage:

2180

#

2181

# storage.googleapis.com/{bucket}/{object}

2182

# bucket.storage.googleapis.com/{object}

2183

},

2184

"vmId": "A String", # The ID string of the VM.

2185

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2186

# taskrunner; e.g. "root".

2187

},

2188

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2189

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2190

"algorithm": "A String", # The algorithm to use for autoscaling.

2191

},

2192

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2193

"a_key": "A String",

2194

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2195

},

2196

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2197

"dataset": "A String", # The dataset for the current project where various workflow

2198

# related tables are stored.

2199

#

2200

# The supported resource type is:

2201

#

2202

# Google BigQuery:

2203

# bigquery.googleapis.com/{dataset}

2204

"internalExperiments": { # Experimental settings.

2205

"a_key": "", # Properties of the object. Contains field @type with type URL.

2206

},

2207

"workerRegion": "A String", # The Compute Engine region

2208

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2209

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2210

# with worker_zone. If neither worker_region nor worker_zone is specified,

2211

# default to the control plane's region.

2212

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2213

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2214

#

2215

# Format:

2216

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2217

"userAgent": { # A description of the process that generated the request.

2218

"a_key": "", # Properties of the object.

2219

},

2220

"workerZone": "A String", # The Compute Engine zone

2221

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2222

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2223

# with worker_region. If neither worker_region nor worker_zone is specified,

2224

# a zone in the control plane's region is chosen based on available capacity.

2225

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2226

# unspecified, the service will attempt to choose a reasonable

2227

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2228

# e.g. "compute.googleapis.com".

2229

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2230

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2231

# this resource prefix, where {JOBNAME} is the value of the

2232

# job_name field. The resulting bucket and object prefix is used

2233

# as the prefix of the resources used to store temporary data

2234

# needed during the job execution. NOTE: This will override the

2235

# value in taskrunner_settings.

2236

# The supported resource type is:

2237

#

2238

# Google Cloud Storage:

2239

#

2240

# storage.googleapis.com/{bucket}/{object}

2241

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2242

"experiments": [ # The list of experiments to enable.

2243

"A String",

2244

],

2245

"version": { # A structure describing which components and their versions of the service

2246

# are required in order to run the job.

2247

"a_key": "", # Properties of the object.

2248

},

2249

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2250

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2251

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2252

# callers cannot mutate it.

2253

{ # A message describing the state of a particular execution stage.

2254

"executionStageName": "A String", # The name of the execution stage.

2255

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2256

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2257

},

2258

],

2259

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2260

# by the metadata values provided here. Populated for ListJobs and all GetJob

2261

# views SUMMARY and higher.

2262

# ListJob response and Job SUMMARY view.

2263

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2264

{ # Metadata for a BigTable connector used by the job.

2265

"tableId": "A String", # TableId accessed in the connection.

2266

"projectId": "A String", # ProjectId accessed in the connection.

2267

"instanceId": "A String", # InstanceId accessed in the connection.

2268

},

2269

],

2270

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2271

{ # Metadata for a Spanner connector used by the job.

2272

"databaseId": "A String", # DatabaseId accessed in the connection.

2273

"instanceId": "A String", # InstanceId accessed in the connection.

2274

"projectId": "A String", # ProjectId accessed in the connection.

2275

},

2276

],

2277

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2278

{ # Metadata for a Datastore connector used by the job.

2279

"projectId": "A String", # ProjectId accessed in the connection.

2280

"namespace": "A String", # Namespace used in the connection.

2281

},

2282

],

2283

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

2284

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

2285

"sdkSupportStatus": "A String", # The support status for this SDK version.

2286

"version": "A String", # The version of the SDK used to run the job.

2287

},

2288

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2289

{ # Metadata for a BigQuery connector used by the job.

2290

"table": "A String", # Table accessed in the connection.

2291

"dataset": "A String", # Dataset accessed in the connection.

2292

"projectId": "A String", # Project accessed in the connection.

2293

"query": "A String", # Query used to access data in the connection.

2294

},

2295

],

2296

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2297

{ # Metadata for a File connector used by the job.

2298

"filePattern": "A String", # File Pattern used to access files by the connector.

2299

},

2300

],

2301

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2302

{ # Metadata for a PubSub connector used by the job.

2303

"subscription": "A String", # Subscription used in the connection.

2304

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2309

# snapshot.

2310

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2311

"type": "A String", # The type of Cloud Dataflow job.

2312

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2313

# A description of the user pipeline and stages through which it is executed.

2314

# Created by Cloud Dataflow service. Only retrieved with

2315

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2316

# form. This data is provided by the Dataflow service for ease of visualizing

2317

# the pipeline and interpreting Dataflow provided metrics.

2318

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2319

{ # Description of the composing transforms, names/ids, and input/outputs of a

2320

# stage of execution. Some composing transforms and sources may have been

2321

# generated by the Dataflow service during execution planning.

2322

"id": "A String", # Dataflow service generated id for this stage.

2323

"componentTransform": [ # Transforms that comprise this execution stage.

2324

{ # Description of a transform executed as part of an execution stage.

2325

"originalTransform": "A String", # User name for the original user transform with which this transform is

2326

# most closely associated.

2327

"name": "A String", # Dataflow service generated name for this source.

2328

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2329

},

2330

],

2331

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2332

{ # Description of an interstitial value between transforms in an execution

2333

# stage.

2334

"name": "A String", # Dataflow service generated name for this source.

2335

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2336

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2337

# source is most closely associated.

2338

},

2339

],

2340

"kind": "A String", # Type of tranform this stage is executing.

2341

"outputSource": [ # Output sources for this stage.

2342

{ # Description of an input or output of an execution stage.

2343

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2344

# source is most closely associated.

2345

"name": "A String", # Dataflow service generated name for this source.

2346

"sizeBytes": "A String", # Size of the source, if measurable.

2347

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2348

},

2349

],

2350

"name": "A String", # Dataflow service generated name for this stage.

2351

"inputSource": [ # Input sources for this stage.

2352

{ # Description of an input or output of an execution stage.

2353

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2354

# source is most closely associated.

2355

"name": "A String", # Dataflow service generated name for this source.

2356

"sizeBytes": "A String", # Size of the source, if measurable.

2357

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2363

{ # Description of the type, names/ids, and input/outputs for a transform.

2364

"kind": "A String", # Type of transform.

2365

"inputCollectionName": [ # User names for all collection inputs to this transform.

2366

"A String",

2367

],

2368

"name": "A String", # User provided name for this transform instance.

2369

"id": "A String", # SDK generated id of this transform instance.

2370

"displayData": [ # Transform-specific display data.

2371

{ # Data provided with a pipeline or transform to provide descriptive info.

2372

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2373

"boolValue": True or False, # Contains value if the data is of a boolean type.

2374

"javaClassValue": "A String", # Contains value if the data is of java class type.

2375

"strValue": "A String", # Contains value if the data is of string type.

2376

"int64Value": "A String", # Contains value if the data is of int64 type.

2377

"durationValue": "A String", # Contains value if the data is of duration type.

2378

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2379

# language namespace (i.e. python module) which defines the display data.

2380

# This allows a dax monitoring system to specially handle the data

2381

# and perform custom rendering.

2382

"floatValue": 3.14, # Contains value if the data is of float type.

2383

"key": "A String", # The key identifying the display data.

2384

# This is intended to be used as a label for the display data

2385

# when viewed in a dax monitoring system.

2386

"shortStrValue": "A String", # A possible additional shorter value to display.

2387

# For example a java_class_name_value of com.mypackage.MyDoFn

2388

# will be stored with MyDoFn as the short_str_value and

2389

# com.mypackage.MyDoFn as the java_class_name value.

2390

# short_str_value can be displayed and java_class_name_value

2391

# will be displayed as a tooltip.

2392

"url": "A String", # An optional full URL.

2393

"label": "A String", # An optional label to display in a dax UI for the element.

2394

},

2395

],

2396

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

2402

{ # Data provided with a pipeline or transform to provide descriptive info.

2403

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2404

"boolValue": True or False, # Contains value if the data is of a boolean type.

2405

"javaClassValue": "A String", # Contains value if the data is of java class type.

2406

"strValue": "A String", # Contains value if the data is of string type.

2407

"int64Value": "A String", # Contains value if the data is of int64 type.

2408

"durationValue": "A String", # Contains value if the data is of duration type.

2409

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2410

# language namespace (i.e. python module) which defines the display data.

2411

# This allows a dax monitoring system to specially handle the data

2412

# and perform custom rendering.

2413

"floatValue": 3.14, # Contains value if the data is of float type.

2414

"key": "A String", # The key identifying the display data.

2415

# This is intended to be used as a label for the display data

2416

# when viewed in a dax monitoring system.

2417

"shortStrValue": "A String", # A possible additional shorter value to display.

2418

# For example a java_class_name_value of com.mypackage.MyDoFn

2419

# will be stored with MyDoFn as the short_str_value and

2420

# com.mypackage.MyDoFn as the java_class_name value.

2421

# short_str_value can be displayed and java_class_name_value

2422

# will be displayed as a tooltip.

2423

"url": "A String", # An optional full URL.

2424

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2429

# of the job it replaced.

2430

#

2431

# When sending a `CreateJobRequest`, you can update a job by specifying it

2432

# here. The job named here is stopped, and its intermediate state is

2433

# transferred to this job.

2434

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2435

# for temporary storage. These temporary files will be

2436

# removed on job completion.

2437

# No duplicates are allowed.

2438

# No file patterns are supported.

2439

#

2440

# The supported files are:

2441

#

2442

# Google Cloud Storage:

2443

#

2444

# storage.googleapis.com/{bucket}/{object}

2445

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2446

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2447

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2448

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2449

#

2450

# Only one Job with a given name may exist in a project at any

2451

# given time. If a caller attempts to create a Job with the same

2452

# name as an already-existing Job, the attempt returns the

2453

# existing Job.

2454

#

2455

# The name must match the regular expression

2456

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2457

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2458

#

2459

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2460

{ # Defines a particular step within a Cloud Dataflow job.

2461

#

2462

# A job consists of multiple steps, each of which performs some

2463

# specific operation as part of the overall job. Data is typically

2464

# passed from one step to another as part of the job.

2465

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2466

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2467

# Map-Reduce job:

2468

#

2469

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2470

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2471

#

2472

# * Validate the elements.

2473

#

2474

# * Apply a user-defined function to map each element to some value

2475

# and extract an element-specific key value.

2476

#

2477

# * Group elements with the same key into a single element with

2478

# that key, transforming a multiply-keyed collection into a

2479

# uniquely-keyed collection.

2480

#

2481

# * Write the elements out to some data sink.

2482

#

2483

# Note that the Cloud Dataflow service may be used to run many different

2484

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2485

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2486

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2487

"kind": "A String", # The kind of step in the Cloud Dataflow job.

2488

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2489

# predefined step has its own required set of properties.

2490

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2491

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2492

},

2493

},

2494

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2495

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2496

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2497

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2498

# isn't contained in the submitted job.

2499

"stages": { # A mapping from each stage to the information about that stage.

2500

"a_key": { # Contains information about how a particular

2501

# google.dataflow.v1beta3.Step will be executed.

2502

"stepName": [ # The steps associated with the execution stage.

2503

# Note that stages may have several steps, and that a given step

2504

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2511

#

2512

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2513

# specified.

2514

#

2515

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2516

# terminal state. After a job has reached a terminal state, no

2517

# further state updates may be made.

2518

#

2519

# This field may be mutated by the Cloud Dataflow service;

2520

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2521

"location": "A String", # The [regional endpoint]

2522

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2523

# contains this job.

2524

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2525

# Flexible resource scheduling jobs are started with some delay after job

2526

# creation, so start_time is unset before start and is updated when the

2527

# job is started by the Cloud Dataflow service. For other jobs, start_time

2528

# always equals to create_time and is immutable and set by the Cloud Dataflow

2529

# service.

2530

"stepsLocation": "A String", # The GCS location where the steps are stored.

2531

"labels": { # User-defined labels for this job.

2532

#

2533

# The labels map can contain no more than 64 entries. Entries of the labels

2534

# map are UTF8 strings that comply with the following restrictions:

2535

#

2536

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2537

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2538

# * Both keys and values are additionally constrained to be <= 128 bytes in

2539

# size.

2540

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2541

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2542

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2543

# Cloud Dataflow service.

2544

"requestedState": "A String", # The job's requested state.

2545

#

2546

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2547

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2548

# also be used to directly set a job's requested state to

2549

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2550

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2551

}</pre>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2555

<code class="details" id="getMetrics">getMetrics(projectId, jobId, location=None, startTime=None, x__xgafv=None)</code>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2556

<pre>Request the job status.

2557

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2558

To request the status of a job, we recommend using

2559

`projects.locations.jobs.getMetrics` with a [regional endpoint]

2560

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2561

`projects.jobs.getMetrics` is not recommended, as you can only request the

2562

status of jobs that are running in `us-central1`.

2563

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2564

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2565

projectId: string, A project id. (required)

2566

jobId: string, The job to get messages for. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2567

location: string, The [regional endpoint]

2568

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2569

contains the job specified by job_id.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2570

startTime: string, Return only metric data that has changed since this time.

2571

Default is to return all information about all metrics for the job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2572

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2573

Allowed values

2574

1 - v1 error format

2575

2 - v2 error format

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2576

2577

Returns:

2578

An object of the form:

2579

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2580

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2581

# of a Dataflow job. Metrics correspond to user-defined and system-defined

2582

# metrics in the job.

2583

#

2584

# This resource captures only the most recent values of each metric;

2585

# time-series data can be queried for them (under the same metric names)

2586

# from Cloud Monitoring.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2587

"metricTime": "A String", # Timestamp as of which metric values are current.

2588

"metrics": [ # All metrics for this job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2589

{ # Describes the state of a metric.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2590

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

2591

# possible value type is a list of Values whose type can be Long, Double,

2592

# or String, according to the metric's type. All Values in the list must

2593

# be of the same type.

2594

"gauge": "", # A struct value describing properties of a Gauge.

2595

# Metrics of gauge type show the value of a metric across time, and is

2596

# aggregated based on the newest value.

2597

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

2598

# value accumulated since the worker started working on this WorkItem.

2599

# By default this is false, indicating that this metric is reported

2600

# as a delta that is not associated with any WorkItem.

2601

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

2602

# service.

2603

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

2604

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2605

# The specified aggregation kind is case-insensitive.

2606

#

2607

# If omitted, this is not an aggregated value but instead

2608

# a single metric sample value.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2609

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

2610

# "And", and "Or". The possible value types are Long, Double, and Boolean.

2611

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

2612

# This holds the count of the aggregated values and is used in combination

2613

# with mean_sum above to obtain the actual mean aggregate value.

2614

# The only possible value type is Long.

2615

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2616

# This holds the sum of the aggregated values and is used in combination

2617

# with mean_count below to obtain the actual mean aggregate value.

2618

# The only possible value types are Long and Double.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2619

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2620

# reporting work progress; it will be filled in responses from the

2621

# metrics API.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2622

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

2623

# metric.

2624

"context": { # Zero or more labeled fields which identify the part of the job this

2625

# metric is associated with, such as the name of a step or collection.

2626

#

2627

# For example, built-in counters associated with steps will have

2628

# context['step'] = <step-name>. Counters associated with PCollections

2629

# in the SDK will have context['pcollection'] = <pcollection-name>.

2630

"a_key": "A String",

2631

},

2632

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

2633

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

2634

"name": "A String", # Worker-defined metric name.

2635

},

2636

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2637

},

2638

],

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2643

<code class="details" id="list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2644

<pre>List the jobs of a project.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2645

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2646

To list the jobs of a project in a region, we recommend using

2647

`projects.locations.jobs.get` with a [regional endpoint]

2648

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2649

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2650

`projects.jobs.list` is not recommended, as you can only get the list of

2651

jobs that are running in `us-central1`.

2652

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2653

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2654

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2655

filter: string, The kind of filter to use.

2656

location: string, The [regional endpoint]

2657

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2658

contains this job.

2659

pageToken: string, Set this to the 'next_page_token' field of a previous response

2660

to request additional results in a long list.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2661

pageSize: integer, If there are many jobs, limit response to at most this many.

2662

The actual number of jobs returned will be the lesser of max_responses

2663

and an unspecified server-defined limit.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2664

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2665

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2666

Allowed values

2667

1 - v1 error format

2668

2 - v2 error format

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2669

2670

Returns:

2671

An object of the form:

2672

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2673

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2674

# be a partial response, depending on the page size in the ListJobsRequest.

2675

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2676

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2677

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2678

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

2679

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2680

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2681

# failed to respond.

2682

{ # Indicates which [regional endpoint]

2683

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

2684

# to respond to a request for data.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2685

"name": "A String", # The name of the [regional endpoint]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2686

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2687

# failed to respond.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2688

},

2689

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2690

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2691

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2692

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2693

# If this field is set, the service will ensure its uniqueness.

2694

# The request to create a job will fail if the service has knowledge of a

2695

# previously submitted job with the same client's ID and job name.

2696

# The caller may use this field to ensure idempotence of job

2697

# creation across retried attempts to create a job.

2698

# By default, the field is empty and, in that case, the service ignores it.

2699

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2700

#

2701

# This field is set by the Cloud Dataflow service when the Job is

2702

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2703

"currentStateTime": "A String", # The timestamp associated with the current state.

2704

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2705

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2706

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2707

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2708

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

2709

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2710

# options are passed through the service and are used to recreate the

2711

# SDK pipeline options on the worker in a language agnostic and platform

2712

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2713

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2714

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2715

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2716

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2717

# specified in order for the job to have workers.

2718

{ # Describes one particular pool of Cloud Dataflow workers to be

2719

# instantiated by the Cloud Dataflow service in order to perform the

2720

# computations required by a job. Note that a workflow job may use

2721

# multiple pools, in order to match the various computational

2722

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2723

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2724

# select a default set of packages which are useful to worker

2725

# harnesses written in a particular language.

2726

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2727

# the service will use the network "default".

2728

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2729

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2730

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2731

# execute the job. If zero or unspecified, the service will

2732

# attempt to choose a reasonable default.

2733

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2734

# service will choose a number of threads (according to the number of cores

2735

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2736

"diskSourceImage": "A String", # Fully qualified source image for disks.

2737

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2738

{ # The packages that must be installed in order for a worker to run the

2739

# steps of the Cloud Dataflow job that will be assigned to its worker

2740

# pool.

2741

#

2742

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2743

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2744

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2745

# various dependencies (libraries, data files, etc.) required in order

2746

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2747

"location": "A String", # The resource to read the package from. The supported resource type is:

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2748

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2749

# Google Cloud Storage:

2750

#

2751

# storage.googleapis.com/{bucket}

2752

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2753

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2754

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2755

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2756

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2757

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2758

# `TEARDOWN_NEVER`.

2759

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2760

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2761

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2762

# down.

2763

#

2764

# If the workers are not torn down by the service, they will

2765

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2766

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2767

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2768

# policy except for small, manually supervised test jobs.

2769

#

2770

# If unknown or unspecified, the service will attempt to choose a reasonable

2771

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2772

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2773

# Compute Engine API.

2774

"poolArgs": { # Extra arguments for this worker pool.

2775

"a_key": "", # Properties of the object. Contains field @type with type URL.

2776

},

2777

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2778

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2779

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2780

# harness, residing in Google Container Registry.

2781

#

2782

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2783

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2784

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2785

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2786

# service will attempt to choose a reasonable default.

2787

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2788

# are supported.

2789

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2790

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2791

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2792

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2793

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2794

# must be a disk type appropriate to the project and zone in which

2795

# the workers will run. If unknown or unspecified, the service

2796

# will attempt to choose a reasonable default.

2797

#

2798

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2799

# typically ending in "pd-standard". If SSD persistent disks are

2800

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2801

# actual valid values are defined the Google Compute Engine API,

2802

# not by the Cloud Dataflow API; consult the Google Compute Engine

2803

# documentation for more information about determining the set of

2804

# available disk types for a particular project and zone.

2805

#

2806

# Google Compute Engine Disk types are local to a particular

2807

# project in a particular zone, and so the resource name will

2808

# typically look something like this:

2809

#

2810

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2811

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2812

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2813

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2814

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2815

# only be set in the Fn API path. For non-cross-language pipelines this

2816

# should have only one entry. Cross-language pipelines will have two or more

2817

# entries.

2818

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2819

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2820

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2821

# container instance with this image. If false (or unset) recommends using

2822

# more than one core per SDK container instance with this image for

2823

# efficiency. Note that Dataflow service may choose to override this property

2824

# if needed.

2825

},

2826

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2827

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2828

# the form "regions/REGION/subnetworks/SUBNETWORK".

2829

"ipConfiguration": "A String", # Configuration for VM IPs.

2830

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2831

# using the standard Dataflow task runner. Users should ignore

2832

# this field.

2833

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2834

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2835

# taskrunner; e.g. "wheel".

2836

"harnessCommand": "A String", # The command to launch the worker harness.

2837

"logDir": "A String", # The directory on the VM to store logs.

2838

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2839

# access the Cloud Dataflow API.

2840

"A String",

2841

],

2842

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2843

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2844

# will not be uploaded.

2845

#

2846

# The supported resource type is:

2847

#

2848

# Google Cloud Storage:

2849

# storage.googleapis.com/{bucket}/{object}

2850

# bucket.storage.googleapis.com/{object}

2851

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2852

"workflowFileName": "A String", # The file to store the workflow in.

2853

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2854

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2855

# temporary storage.

2856

#

2857

# The supported resource type is:

2858

#

2859

# Google Cloud Storage:

2860

# storage.googleapis.com/{bucket}/{object}

2861

# bucket.storage.googleapis.com/{object}

2862

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2863

"languageHint": "A String", # The suggested backend language.

2864

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2865

#

2866

# When workers access Google Cloud APIs, they logically do so via

2867

# relative URLs. If this field is specified, it supplies the base

2868

# URL to use for resolving these relative URLs. The normative

2869

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2870

# Locators".

2871

#

2872

# If not specified, the default value is "http://www.googleapis.com/"

2873

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2874

# console.

2875

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2876

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2877

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2878

#

2879

# When workers access Google Cloud APIs, they logically do so via

2880

# relative URLs. If this field is specified, it supplies the base

2881

# URL to use for resolving these relative URLs. The normative

2882

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2883

# Locators".

2884

#

2885

# If not specified, the default value is "http://www.googleapis.com/"

2886

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2887

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2888

# "dataflow/v1b3/projects".

2889

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2890

# "shuffle/v1beta1".

2891

"workerId": "A String", # The ID of the worker running this pipeline.

2892

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2893

# storage.

2894

#

2895

# The supported resource type is:

2896

#

2897

# Google Cloud Storage:

2898

#

2899

# storage.googleapis.com/{bucket}/{object}

2900

# bucket.storage.googleapis.com/{object}

2901

},

2902

"vmId": "A String", # The ID string of the VM.

2903

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2904

# taskrunner; e.g. "root".

2905

},

2906

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2907

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2908

"algorithm": "A String", # The algorithm to use for autoscaling.

2909

},

2910

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2911

"a_key": "A String",

2912

},

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2913

},

2914

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2915

"dataset": "A String", # The dataset for the current project where various workflow

2916

# related tables are stored.

2917

#

2918

# The supported resource type is:

2919

#

2920

# Google BigQuery:

2921

# bigquery.googleapis.com/{dataset}

2922

"internalExperiments": { # Experimental settings.

2923

"a_key": "", # Properties of the object. Contains field @type with type URL.

2924

},

2925

"workerRegion": "A String", # The Compute Engine region

2926

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2927

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2928

# with worker_zone. If neither worker_region nor worker_zone is specified,

2929

# default to the control plane's region.

2930

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2931

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2932

#

2933

# Format:

2934

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2935

"userAgent": { # A description of the process that generated the request.

2936

"a_key": "", # Properties of the object.

2937

},

2938

"workerZone": "A String", # The Compute Engine zone

2939

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2940

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2941

# with worker_region. If neither worker_region nor worker_zone is specified,

2942

# a zone in the control plane's region is chosen based on available capacity.

2943

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2944

# unspecified, the service will attempt to choose a reasonable

2945

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2946

# e.g. "compute.googleapis.com".

2947

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2948

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2949

# this resource prefix, where {JOBNAME} is the value of the

2950

# job_name field. The resulting bucket and object prefix is used

2951

# as the prefix of the resources used to store temporary data

2952

# needed during the job execution. NOTE: This will override the

2953

# value in taskrunner_settings.

2954

# The supported resource type is:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2955

#

2956

# Google Cloud Storage:

2957

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2958

# storage.googleapis.com/{bucket}/{object}

2959

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2960

"experiments": [ # The list of experiments to enable.

2961

"A String",

2962

],

2963

"version": { # A structure describing which components and their versions of the service

2964

# are required in order to run the job.

2965

"a_key": "", # Properties of the object.

2966

},

2967

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2968

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2969

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2970

# callers cannot mutate it.

2971

{ # A message describing the state of a particular execution stage.

2972

"executionStageName": "A String", # The name of the execution stage.

2973

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2974

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2975

},

2976

],

2977

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2978

# by the metadata values provided here. Populated for ListJobs and all GetJob

2979

# views SUMMARY and higher.

2980

# ListJob response and Job SUMMARY view.

2981

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2982

{ # Metadata for a BigTable connector used by the job.

2983

"tableId": "A String", # TableId accessed in the connection.

2984

"projectId": "A String", # ProjectId accessed in the connection.

2985

"instanceId": "A String", # InstanceId accessed in the connection.

2986

},

2987

],

2988

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2989

{ # Metadata for a Spanner connector used by the job.

2990

"databaseId": "A String", # DatabaseId accessed in the connection.

2991

"instanceId": "A String", # InstanceId accessed in the connection.

2992

"projectId": "A String", # ProjectId accessed in the connection.

2993

},

2994

],

2995

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2996

{ # Metadata for a Datastore connector used by the job.

2997

"projectId": "A String", # ProjectId accessed in the connection.

2998

"namespace": "A String", # Namespace used in the connection.

2999

},

3000

],

3001

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3002

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3003

"sdkSupportStatus": "A String", # The support status for this SDK version.

3004

"version": "A String", # The version of the SDK used to run the job.

3005

},

3006

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3007

{ # Metadata for a BigQuery connector used by the job.

3008

"table": "A String", # Table accessed in the connection.

3009

"dataset": "A String", # Dataset accessed in the connection.

3010

"projectId": "A String", # Project accessed in the connection.

3011

"query": "A String", # Query used to access data in the connection.

3012

},

3013

],

3014

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3015

{ # Metadata for a File connector used by the job.

3016

"filePattern": "A String", # File Pattern used to access files by the connector.

3017

},

3018

],

3019

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3020

{ # Metadata for a PubSub connector used by the job.

3021

"subscription": "A String", # Subscription used in the connection.

3022

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3027

# snapshot.

3028

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

3029

"type": "A String", # The type of Cloud Dataflow job.

3030

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3031

# A description of the user pipeline and stages through which it is executed.

3032

# Created by Cloud Dataflow service. Only retrieved with

3033

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3034

# form. This data is provided by the Dataflow service for ease of visualizing

3035

# the pipeline and interpreting Dataflow provided metrics.

3036

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3037

{ # Description of the composing transforms, names/ids, and input/outputs of a

3038

# stage of execution. Some composing transforms and sources may have been

3039

# generated by the Dataflow service during execution planning.

3040

"id": "A String", # Dataflow service generated id for this stage.

3041

"componentTransform": [ # Transforms that comprise this execution stage.

3042

{ # Description of a transform executed as part of an execution stage.

3043

"originalTransform": "A String", # User name for the original user transform with which this transform is

3044

# most closely associated.

3045

"name": "A String", # Dataflow service generated name for this source.

3046

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3047

},

3048

],

3049

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3050

{ # Description of an interstitial value between transforms in an execution

3051

# stage.

3052

"name": "A String", # Dataflow service generated name for this source.

3053

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3054

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3055

# source is most closely associated.

3056

},

3057

],

3058

"kind": "A String", # Type of tranform this stage is executing.

3059

"outputSource": [ # Output sources for this stage.

3060

{ # Description of an input or output of an execution stage.

3061

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3062

# source is most closely associated.

3063

"name": "A String", # Dataflow service generated name for this source.

3064

"sizeBytes": "A String", # Size of the source, if measurable.

3065

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3066

},

3067

],

3068

"name": "A String", # Dataflow service generated name for this stage.

3069

"inputSource": [ # Input sources for this stage.

3070

{ # Description of an input or output of an execution stage.

3071

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3072

# source is most closely associated.

3073

"name": "A String", # Dataflow service generated name for this source.

3074

"sizeBytes": "A String", # Size of the source, if measurable.

3075

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3081

{ # Description of the type, names/ids, and input/outputs for a transform.

3082

"kind": "A String", # Type of transform.

3083

"inputCollectionName": [ # User names for all collection inputs to this transform.

3084

"A String",

3085

],

3086

"name": "A String", # User provided name for this transform instance.

3087

"id": "A String", # SDK generated id of this transform instance.

3088

"displayData": [ # Transform-specific display data.

3089

{ # Data provided with a pipeline or transform to provide descriptive info.

3090

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3091

"boolValue": True or False, # Contains value if the data is of a boolean type.

3092

"javaClassValue": "A String", # Contains value if the data is of java class type.

3093

"strValue": "A String", # Contains value if the data is of string type.

3094

"int64Value": "A String", # Contains value if the data is of int64 type.

3095

"durationValue": "A String", # Contains value if the data is of duration type.

3096

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3097

# language namespace (i.e. python module) which defines the display data.

3098

# This allows a dax monitoring system to specially handle the data

3099

# and perform custom rendering.

3100

"floatValue": 3.14, # Contains value if the data is of float type.

3101

"key": "A String", # The key identifying the display data.

3102

# This is intended to be used as a label for the display data

3103

# when viewed in a dax monitoring system.

3104

"shortStrValue": "A String", # A possible additional shorter value to display.

3105

# For example a java_class_name_value of com.mypackage.MyDoFn

3106

# will be stored with MyDoFn as the short_str_value and

3107

# com.mypackage.MyDoFn as the java_class_name value.

3108

# short_str_value can be displayed and java_class_name_value

3109

# will be displayed as a tooltip.

3110

"url": "A String", # An optional full URL.

3111

"label": "A String", # An optional label to display in a dax UI for the element.

3112

},

3113

],

3114

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3120

{ # Data provided with a pipeline or transform to provide descriptive info.

3121

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3122

"boolValue": True or False, # Contains value if the data is of a boolean type.

3123

"javaClassValue": "A String", # Contains value if the data is of java class type.

3124

"strValue": "A String", # Contains value if the data is of string type.

3125

"int64Value": "A String", # Contains value if the data is of int64 type.

3126

"durationValue": "A String", # Contains value if the data is of duration type.

3127

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3128

# language namespace (i.e. python module) which defines the display data.

3129

# This allows a dax monitoring system to specially handle the data

3130

# and perform custom rendering.

3131

"floatValue": 3.14, # Contains value if the data is of float type.

3132

"key": "A String", # The key identifying the display data.

3133

# This is intended to be used as a label for the display data

3134

# when viewed in a dax monitoring system.

3135

"shortStrValue": "A String", # A possible additional shorter value to display.

3136

# For example a java_class_name_value of com.mypackage.MyDoFn

3137

# will be stored with MyDoFn as the short_str_value and

3138

# com.mypackage.MyDoFn as the java_class_name value.

3139

# short_str_value can be displayed and java_class_name_value

3140

# will be displayed as a tooltip.

3141

"url": "A String", # An optional full URL.

3142

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3147

# of the job it replaced.

3148

#

3149

# When sending a `CreateJobRequest`, you can update a job by specifying it

3150

# here. The job named here is stopped, and its intermediate state is

3151

# transferred to this job.

3152

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3153

# for temporary storage. These temporary files will be

3154

# removed on job completion.

3155

# No duplicates are allowed.

3156

# No file patterns are supported.

3157

#

3158

# The supported files are:

3159

#

3160

# Google Cloud Storage:

3161

#

3162

# storage.googleapis.com/{bucket}/{object}

3163

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3164

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3165

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3166

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3167

#

3168

# Only one Job with a given name may exist in a project at any

3169

# given time. If a caller attempts to create a Job with the same

3170

# name as an already-existing Job, the attempt returns the

3171

# existing Job.

3172

#

3173

# The name must match the regular expression

3174

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3175

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3176

#

3177

# The top-level steps that constitute the entire job.

3178

{ # Defines a particular step within a Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3179

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3180

# A job consists of multiple steps, each of which performs some

3181

# specific operation as part of the overall job. Data is typically

3182

# passed from one step to another as part of the job.

3183

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3184

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3185

# Map-Reduce job:

3186

#

3187

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3188

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3189

#

3190

# * Validate the elements.

3191

#

3192

# * Apply a user-defined function to map each element to some value

3193

# and extract an element-specific key value.

3194

#

3195

# * Group elements with the same key into a single element with

3196

# that key, transforming a multiply-keyed collection into a

3197

# uniquely-keyed collection.

3198

#

3199

# * Write the elements out to some data sink.

3200

#

3201

# Note that the Cloud Dataflow service may be used to run many different

3202

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3203

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3204

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3205

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3206

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3207

# predefined step has its own required set of properties.

3208

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3209

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3210

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3211

},

3212

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3213

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3214

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3215

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3216

# isn't contained in the submitted job.

3217

"stages": { # A mapping from each stage to the information about that stage.

3218

"a_key": { # Contains information about how a particular

3219

# google.dataflow.v1beta3.Step will be executed.

3220

"stepName": [ # The steps associated with the execution stage.

3221

# Note that stages may have several steps, and that a given step

3222

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3229

#

3230

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3231

# specified.

3232

#

3233

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3234

# terminal state. After a job has reached a terminal state, no

3235

# further state updates may be made.

3236

#

3237

# This field may be mutated by the Cloud Dataflow service;

3238

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3239

"location": "A String", # The [regional endpoint]

3240

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3241

# contains this job.

3242

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3243

# Flexible resource scheduling jobs are started with some delay after job

3244

# creation, so start_time is unset before start and is updated when the

3245

# job is started by the Cloud Dataflow service. For other jobs, start_time

3246

# always equals to create_time and is immutable and set by the Cloud Dataflow

3247

# service.

3248

"stepsLocation": "A String", # The GCS location where the steps are stored.

3249

"labels": { # User-defined labels for this job.

3250

#

3251

# The labels map can contain no more than 64 entries. Entries of the labels

3252

# map are UTF8 strings that comply with the following restrictions:

3253

#

3254

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3255

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3256

# * Both keys and values are additionally constrained to be <= 128 bytes in

3257

# size.

3258

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3259

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3260

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3261

# Cloud Dataflow service.

3262

"requestedState": "A String", # The job's requested state.

3263

#

3264

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3265

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3266

# also be used to directly set a job's requested state to

3267

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3268

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3269

},

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

],

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

3276

<pre>Retrieves the next page of results.

3277

3278

Args:

3279

previous_request: The request for the previous page. (required)

3280

previous_response: The response from the request for the previous page. (required)

3281

3282

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3283

A request object that you can call 'execute()' on to request the next

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3284

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3289

<code class="details" id="snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3290

<pre>Snapshot the state of a streaming job.

3291

3292

Args:

3293

projectId: string, The project which owns the job to be snapshotted. (required)

3294

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3295

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3296

The object takes the form of:

3297

3298

{ # Request to create a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3299

"description": "A String", # User specified description of the snapshot. Maybe empty.

3300

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

3301

"ttl": "A String", # TTL for the snapshot.

3302

"location": "A String", # The location that contains this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3303

}

3304

3305

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3312

3313

{ # Represents a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3314

"state": "A String", # State of the snapshot.

3315

"sourceJobId": "A String", # The job this snapshot was created from.

3316

"projectId": "A String", # The project this snapshot belongs to.

3317

"id": "A String", # The unique ID of this snapshot.

3318

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

3319

"description": "A String", # User specified description of the snapshot. Maybe empty.

3320

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3321

# state.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3322

"pubsubMetadata": [ # PubSub snapshot metadata.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3323

{ # Represents a Pubsub snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3324

"expireTime": "A String", # The expire time of the Pubsub snapshot.

3325

"snapshotName": "A String", # The name of the Pubsub snapshot.

3326

"topicName": "A String", # The name of the Pubsub topic.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3327

},

3328

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3329

"creationTime": "A String", # The time this snapshot was created.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3334

<code class="details" id="update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3335

<pre>Updates the state of an existing Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3336

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3337

To update the state of an existing job, we recommend using

3338

`projects.locations.jobs.update` with a [regional endpoint]

3339

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

3340

`projects.jobs.update` is not recommended, as you can only update the state

3341

of jobs that are running in `us-central1`.

3342

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3343

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3344

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

3345

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3346

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3347

The object takes the form of:

3348

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3349

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3350

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3351

# If this field is set, the service will ensure its uniqueness.

3352

# The request to create a job will fail if the service has knowledge of a

3353

# previously submitted job with the same client's ID and job name.

3354

# The caller may use this field to ensure idempotence of job

3355

# creation across retried attempts to create a job.

3356

# By default, the field is empty and, in that case, the service ignores it.

3357

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3358

#

3359

# This field is set by the Cloud Dataflow service when the Job is

3360

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3361

"currentStateTime": "A String", # The timestamp associated with the current state.

3362

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3363

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3364

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3365

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3366

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

3367

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3368

# options are passed through the service and are used to recreate the

3369

# SDK pipeline options on the worker in a language agnostic and platform

3370

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3371

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3372

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3373

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3374

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3375

# specified in order for the job to have workers.

3376

{ # Describes one particular pool of Cloud Dataflow workers to be

3377

# instantiated by the Cloud Dataflow service in order to perform the

3378

# computations required by a job. Note that a workflow job may use

3379

# multiple pools, in order to match the various computational

3380

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3381

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3382

# select a default set of packages which are useful to worker

3383

# harnesses written in a particular language.

3384

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3385

# the service will use the network "default".

3386

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3387

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3388

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3389

# execute the job. If zero or unspecified, the service will

3390

# attempt to choose a reasonable default.

3391

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3392

# service will choose a number of threads (according to the number of cores

3393

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3394

"diskSourceImage": "A String", # Fully qualified source image for disks.

3395

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3396

{ # The packages that must be installed in order for a worker to run the

3397

# steps of the Cloud Dataflow job that will be assigned to its worker

3398

# pool.

3399

#

3400

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3401

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3402

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3403

# various dependencies (libraries, data files, etc.) required in order

3404

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3405

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3406

#

3407

# Google Cloud Storage:

3408

#

3409

# storage.googleapis.com/{bucket}

3410

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3411

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3412

},

3413

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3414

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3415

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3416

# `TEARDOWN_NEVER`.

3417

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3418

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3419

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3420

# down.

3421

#

3422

# If the workers are not torn down by the service, they will

3423

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3424

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3425

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3426

# policy except for small, manually supervised test jobs.

3427

#

3428

# If unknown or unspecified, the service will attempt to choose a reasonable

3429

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3430

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3431

# Compute Engine API.

3432

"poolArgs": { # Extra arguments for this worker pool.

3433

"a_key": "", # Properties of the object. Contains field @type with type URL.

3434

},

3435

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3436

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3437

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3438

# harness, residing in Google Container Registry.

3439

#

3440

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

3441

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3442

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3443

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3444

# service will attempt to choose a reasonable default.

3445

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3446

# are supported.

3447

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3448

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3449

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3450

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3451

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3452

# must be a disk type appropriate to the project and zone in which

3453

# the workers will run. If unknown or unspecified, the service

3454

# will attempt to choose a reasonable default.

3455

#

3456

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3457

# typically ending in "pd-standard". If SSD persistent disks are

3458

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3459

# actual valid values are defined the Google Compute Engine API,

3460

# not by the Cloud Dataflow API; consult the Google Compute Engine

3461

# documentation for more information about determining the set of

3462

# available disk types for a particular project and zone.

3463

#

3464

# Google Compute Engine Disk types are local to a particular

3465

# project in a particular zone, and so the resource name will

3466

# typically look something like this:

3467

#

3468

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3469

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3470

},

3471

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3472

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3473

# only be set in the Fn API path. For non-cross-language pipelines this

3474

# should have only one entry. Cross-language pipelines will have two or more

3475

# entries.

3476

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3477

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3478

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3479

# container instance with this image. If false (or unset) recommends using

3480

# more than one core per SDK container instance with this image for

3481

# efficiency. Note that Dataflow service may choose to override this property

3482

# if needed.

3483

},

3484

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3485

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3486

# the form "regions/REGION/subnetworks/SUBNETWORK".

3487

"ipConfiguration": "A String", # Configuration for VM IPs.

3488

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3489

# using the standard Dataflow task runner. Users should ignore

3490

# this field.

3491

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

3492

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3493

# taskrunner; e.g. "wheel".

3494

"harnessCommand": "A String", # The command to launch the worker harness.

3495

"logDir": "A String", # The directory on the VM to store logs.

3496

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3497

# access the Cloud Dataflow API.

3498

"A String",

3499

],

3500

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3501

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3502

# will not be uploaded.

3503

#

3504

# The supported resource type is:

3505

#

3506

# Google Cloud Storage:

3507

# storage.googleapis.com/{bucket}/{object}

3508

# bucket.storage.googleapis.com/{object}

3509

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3510

"workflowFileName": "A String", # The file to store the workflow in.

3511

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3512

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3513

# temporary storage.

3514

#

3515

# The supported resource type is:

3516

#

3517

# Google Cloud Storage:

3518

# storage.googleapis.com/{bucket}/{object}

3519

# bucket.storage.googleapis.com/{object}

3520

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3521

"languageHint": "A String", # The suggested backend language.

3522

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3523

#

3524

# When workers access Google Cloud APIs, they logically do so via

3525

# relative URLs. If this field is specified, it supplies the base

3526

# URL to use for resolving these relative URLs. The normative

3527

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3528

# Locators".

3529

#

3530

# If not specified, the default value is "http://www.googleapis.com/"

3531

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3532

# console.

3533

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3534

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3535

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3536

#

3537

# When workers access Google Cloud APIs, they logically do so via

3538

# relative URLs. If this field is specified, it supplies the base

3539

# URL to use for resolving these relative URLs. The normative

3540

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3541

# Locators".

3542

#

3543

# If not specified, the default value is "http://www.googleapis.com/"

3544

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3545

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3546

# "dataflow/v1b3/projects".

3547

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3548

# "shuffle/v1beta1".

3549

"workerId": "A String", # The ID of the worker running this pipeline.

3550

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3551

# storage.

3552

#

3553

# The supported resource type is:

3554

#

3555

# Google Cloud Storage:

3556

#

3557

# storage.googleapis.com/{bucket}/{object}

3558

# bucket.storage.googleapis.com/{object}

3559

},

3560

"vmId": "A String", # The ID string of the VM.

3561

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3562

# taskrunner; e.g. "root".

3563

},

3564

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3565

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3566

"algorithm": "A String", # The algorithm to use for autoscaling.

3567

},

3568

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3569

"a_key": "A String",

3570

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3571

},

3572

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3573

"dataset": "A String", # The dataset for the current project where various workflow

3574

# related tables are stored.

3575

#

3576

# The supported resource type is:

3577

#

3578

# Google BigQuery:

3579

# bigquery.googleapis.com/{dataset}

3580

"internalExperiments": { # Experimental settings.

3581

"a_key": "", # Properties of the object. Contains field @type with type URL.

3582

},

3583

"workerRegion": "A String", # The Compute Engine region

3584

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3585

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3586

# with worker_zone. If neither worker_region nor worker_zone is specified,

3587

# default to the control plane's region.

3588

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3589

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3590

#

3591

# Format:

3592

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

3593

"userAgent": { # A description of the process that generated the request.

3594

"a_key": "", # Properties of the object.

3595

},

3596

"workerZone": "A String", # The Compute Engine zone

3597

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3598

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3599

# with worker_region. If neither worker_region nor worker_zone is specified,

3600

# a zone in the control plane's region is chosen based on available capacity.

3601

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3602

# unspecified, the service will attempt to choose a reasonable

3603

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3604

# e.g. "compute.googleapis.com".

3605

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3606

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3607

# this resource prefix, where {JOBNAME} is the value of the

3608

# job_name field. The resulting bucket and object prefix is used

3609

# as the prefix of the resources used to store temporary data

3610

# needed during the job execution. NOTE: This will override the

3611

# value in taskrunner_settings.

3612

# The supported resource type is:

3613

#

3614

# Google Cloud Storage:

3615

#

3616

# storage.googleapis.com/{bucket}/{object}

3617

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3618

"experiments": [ # The list of experiments to enable.

3619

"A String",

3620

],

3621

"version": { # A structure describing which components and their versions of the service

3622

# are required in order to run the job.

3623

"a_key": "", # Properties of the object.

3624

},

3625

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3626

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3627

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3628

# callers cannot mutate it.

3629

{ # A message describing the state of a particular execution stage.

3630

"executionStageName": "A String", # The name of the execution stage.

3631

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3632

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3633

},

3634

],

3635

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3636

# by the metadata values provided here. Populated for ListJobs and all GetJob

3637

# views SUMMARY and higher.

3638

# ListJob response and Job SUMMARY view.

3639

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3640

{ # Metadata for a BigTable connector used by the job.

3641

"tableId": "A String", # TableId accessed in the connection.

3642

"projectId": "A String", # ProjectId accessed in the connection.

3643

"instanceId": "A String", # InstanceId accessed in the connection.

3644

},

3645

],

3646

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3647

{ # Metadata for a Spanner connector used by the job.

3648

"databaseId": "A String", # DatabaseId accessed in the connection.

3649

"instanceId": "A String", # InstanceId accessed in the connection.

3650

"projectId": "A String", # ProjectId accessed in the connection.

3651

},

3652

],

3653

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3654

{ # Metadata for a Datastore connector used by the job.

3655

"projectId": "A String", # ProjectId accessed in the connection.

3656

"namespace": "A String", # Namespace used in the connection.

3657

},

3658

],

3659

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3660

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3661

"sdkSupportStatus": "A String", # The support status for this SDK version.

3662

"version": "A String", # The version of the SDK used to run the job.

3663

},

3664

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3665

{ # Metadata for a BigQuery connector used by the job.

3666

"table": "A String", # Table accessed in the connection.

3667

"dataset": "A String", # Dataset accessed in the connection.

3668

"projectId": "A String", # Project accessed in the connection.

3669

"query": "A String", # Query used to access data in the connection.

3670

},

3671

],

3672

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3673

{ # Metadata for a File connector used by the job.

3674

"filePattern": "A String", # File Pattern used to access files by the connector.

3675

},

3676

],

3677

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3678

{ # Metadata for a PubSub connector used by the job.

3679

"subscription": "A String", # Subscription used in the connection.

3680

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3685

# snapshot.

3686

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

3687

"type": "A String", # The type of Cloud Dataflow job.

3688

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3689

# A description of the user pipeline and stages through which it is executed.

3690

# Created by Cloud Dataflow service. Only retrieved with

3691

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3692

# form. This data is provided by the Dataflow service for ease of visualizing

3693

# the pipeline and interpreting Dataflow provided metrics.

3694

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3695

{ # Description of the composing transforms, names/ids, and input/outputs of a

3696

# stage of execution. Some composing transforms and sources may have been

3697

# generated by the Dataflow service during execution planning.

3698

"id": "A String", # Dataflow service generated id for this stage.

3699

"componentTransform": [ # Transforms that comprise this execution stage.

3700

{ # Description of a transform executed as part of an execution stage.

3701

"originalTransform": "A String", # User name for the original user transform with which this transform is

3702

# most closely associated.

3703

"name": "A String", # Dataflow service generated name for this source.

3704

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3705

},

3706

],

3707

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3708

{ # Description of an interstitial value between transforms in an execution

3709

# stage.

3710

"name": "A String", # Dataflow service generated name for this source.

3711

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3712

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3713

# source is most closely associated.

3714

},

3715

],

3716

"kind": "A String", # Type of tranform this stage is executing.

3717

"outputSource": [ # Output sources for this stage.

3718

{ # Description of an input or output of an execution stage.

3719

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3720

# source is most closely associated.

3721

"name": "A String", # Dataflow service generated name for this source.

3722

"sizeBytes": "A String", # Size of the source, if measurable.

3723

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3724

},

3725

],

3726

"name": "A String", # Dataflow service generated name for this stage.

3727

"inputSource": [ # Input sources for this stage.

3728

{ # Description of an input or output of an execution stage.

3729

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3730

# source is most closely associated.

3731

"name": "A String", # Dataflow service generated name for this source.

3732

"sizeBytes": "A String", # Size of the source, if measurable.

3733

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3739

{ # Description of the type, names/ids, and input/outputs for a transform.

3740

"kind": "A String", # Type of transform.

3741

"inputCollectionName": [ # User names for all collection inputs to this transform.

3742

"A String",

3743

],

3744

"name": "A String", # User provided name for this transform instance.

3745

"id": "A String", # SDK generated id of this transform instance.

3746

"displayData": [ # Transform-specific display data.

3747

{ # Data provided with a pipeline or transform to provide descriptive info.

3748

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3749

"boolValue": True or False, # Contains value if the data is of a boolean type.

3750

"javaClassValue": "A String", # Contains value if the data is of java class type.

3751

"strValue": "A String", # Contains value if the data is of string type.

3752

"int64Value": "A String", # Contains value if the data is of int64 type.

3753

"durationValue": "A String", # Contains value if the data is of duration type.

3754

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3755

# language namespace (i.e. python module) which defines the display data.

3756

# This allows a dax monitoring system to specially handle the data

3757

# and perform custom rendering.

3758

"floatValue": 3.14, # Contains value if the data is of float type.

3759

"key": "A String", # The key identifying the display data.

3760

# This is intended to be used as a label for the display data

3761

# when viewed in a dax monitoring system.

3762

"shortStrValue": "A String", # A possible additional shorter value to display.

3763

# For example a java_class_name_value of com.mypackage.MyDoFn

3764

# will be stored with MyDoFn as the short_str_value and

3765

# com.mypackage.MyDoFn as the java_class_name value.

3766

# short_str_value can be displayed and java_class_name_value

3767

# will be displayed as a tooltip.

3768

"url": "A String", # An optional full URL.

3769

"label": "A String", # An optional label to display in a dax UI for the element.

3770

},

3771

],

3772

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3778

{ # Data provided with a pipeline or transform to provide descriptive info.

3779

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3780

"boolValue": True or False, # Contains value if the data is of a boolean type.

3781

"javaClassValue": "A String", # Contains value if the data is of java class type.

3782

"strValue": "A String", # Contains value if the data is of string type.

3783

"int64Value": "A String", # Contains value if the data is of int64 type.

3784

"durationValue": "A String", # Contains value if the data is of duration type.

3785

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3786

# language namespace (i.e. python module) which defines the display data.

3787

# This allows a dax monitoring system to specially handle the data

3788

# and perform custom rendering.

3789

"floatValue": 3.14, # Contains value if the data is of float type.

3790

"key": "A String", # The key identifying the display data.

3791

# This is intended to be used as a label for the display data

3792

# when viewed in a dax monitoring system.

3793

"shortStrValue": "A String", # A possible additional shorter value to display.

3794

# For example a java_class_name_value of com.mypackage.MyDoFn

3795

# will be stored with MyDoFn as the short_str_value and

3796

# com.mypackage.MyDoFn as the java_class_name value.

3797

# short_str_value can be displayed and java_class_name_value

3798

# will be displayed as a tooltip.

3799

"url": "A String", # An optional full URL.

3800

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3805

# of the job it replaced.

3806

#

3807

# When sending a `CreateJobRequest`, you can update a job by specifying it

3808

# here. The job named here is stopped, and its intermediate state is

3809

# transferred to this job.

3810

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3811

# for temporary storage. These temporary files will be

3812

# removed on job completion.

3813

# No duplicates are allowed.

3814

# No file patterns are supported.

3815

#

3816

# The supported files are:

3817

#

3818

# Google Cloud Storage:

3819

#

3820

# storage.googleapis.com/{bucket}/{object}

3821

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3822

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3823

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3824

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3825

#

3826

# Only one Job with a given name may exist in a project at any

3827

# given time. If a caller attempts to create a Job with the same

3828

# name as an already-existing Job, the attempt returns the

3829

# existing Job.

3830

#

3831

# The name must match the regular expression

3832

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3833

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3834

#

3835

# The top-level steps that constitute the entire job.

3836

{ # Defines a particular step within a Cloud Dataflow job.

3837

#

3838

# A job consists of multiple steps, each of which performs some

3839

# specific operation as part of the overall job. Data is typically

3840

# passed from one step to another as part of the job.

3841

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3842

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3843

# Map-Reduce job:

3844

#

3845

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3846

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3847

#

3848

# * Validate the elements.

3849

#

3850

# * Apply a user-defined function to map each element to some value

3851

# and extract an element-specific key value.

3852

#

3853

# * Group elements with the same key into a single element with

3854

# that key, transforming a multiply-keyed collection into a

3855

# uniquely-keyed collection.

3856

#

3857

# * Write the elements out to some data sink.

3858

#

3859

# Note that the Cloud Dataflow service may be used to run many different

3860

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3861

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3862

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3863

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3864

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3865

# predefined step has its own required set of properties.

3866

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3867

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3868

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3869

},

3870

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3871

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3872

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3873

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3874

# isn't contained in the submitted job.

3875

"stages": { # A mapping from each stage to the information about that stage.

3876

"a_key": { # Contains information about how a particular

3877

# google.dataflow.v1beta3.Step will be executed.

3878

"stepName": [ # The steps associated with the execution stage.

3879

# Note that stages may have several steps, and that a given step

3880

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3887

#

3888

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3889

# specified.

3890

#

3891

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3892

# terminal state. After a job has reached a terminal state, no

3893

# further state updates may be made.

3894

#

3895

# This field may be mutated by the Cloud Dataflow service;

3896

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3897

"location": "A String", # The [regional endpoint]

3898

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3899

# contains this job.

3900

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3901

# Flexible resource scheduling jobs are started with some delay after job

3902

# creation, so start_time is unset before start and is updated when the

3903

# job is started by the Cloud Dataflow service. For other jobs, start_time

3904

# always equals to create_time and is immutable and set by the Cloud Dataflow

3905

# service.

3906

"stepsLocation": "A String", # The GCS location where the steps are stored.

3907

"labels": { # User-defined labels for this job.

3908

#

3909

# The labels map can contain no more than 64 entries. Entries of the labels

3910

# map are UTF8 strings that comply with the following restrictions:

3911

#

3912

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3913

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3914

# * Both keys and values are additionally constrained to be <= 128 bytes in

3915

# size.

3916

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3917

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3918

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3919

# Cloud Dataflow service.

3920

"requestedState": "A String", # The job's requested state.

3921

#

3922

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3923

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3924

# also be used to directly set a job's requested state to

3925

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3926

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3927

}

3928

3929

location: string, The [regional endpoint]

3930

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3931

contains this job.

3932

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3939

3940

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3941

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3942

# If this field is set, the service will ensure its uniqueness.

3943

# The request to create a job will fail if the service has knowledge of a

3944

# previously submitted job with the same client's ID and job name.

3945

# The caller may use this field to ensure idempotence of job

3946

# creation across retried attempts to create a job.

3947

# By default, the field is empty and, in that case, the service ignores it.

3948

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3949

#

3950

# This field is set by the Cloud Dataflow service when the Job is

3951

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3952

"currentStateTime": "A String", # The timestamp associated with the current state.

3953

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3954

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3955

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3956

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3957

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

3958

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3959

# options are passed through the service and are used to recreate the

3960

# SDK pipeline options on the worker in a language agnostic and platform

3961

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3962

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

3963

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3964

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3965

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3966

# specified in order for the job to have workers.

3967

{ # Describes one particular pool of Cloud Dataflow workers to be

3968

# instantiated by the Cloud Dataflow service in order to perform the

3969

# computations required by a job. Note that a workflow job may use

3970

# multiple pools, in order to match the various computational

3971

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3972

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3973

# select a default set of packages which are useful to worker

3974

# harnesses written in a particular language.

3975

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3976

# the service will use the network "default".

3977

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3978

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3979

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3980

# execute the job. If zero or unspecified, the service will

3981

# attempt to choose a reasonable default.

3982

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3983

# service will choose a number of threads (according to the number of cores

3984

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3985

"diskSourceImage": "A String", # Fully qualified source image for disks.

3986

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3987

{ # The packages that must be installed in order for a worker to run the

3988

# steps of the Cloud Dataflow job that will be assigned to its worker

3989

# pool.

3990

#

3991

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3992

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3993

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3994

# various dependencies (libraries, data files, etc.) required in order

3995

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3996

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3997

#

3998

# Google Cloud Storage:

3999

#

4000

# storage.googleapis.com/{bucket}

4001

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4002

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4003

},

4004

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4005

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4006

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

4007

# `TEARDOWN_NEVER`.

4008

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

4009

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

4010

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

4011

# down.

4012

#

4013

# If the workers are not torn down by the service, they will

4014

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4015

# user's project until they are explicitly terminated by the user.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4016

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

4017

# policy except for small, manually supervised test jobs.

4018

#

4019

# If unknown or unspecified, the service will attempt to choose a reasonable

4020

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4021

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

4022

# Compute Engine API.

4023

"poolArgs": { # Extra arguments for this worker pool.

4024

"a_key": "", # Properties of the object. Contains field @type with type URL.

4025

},

4026

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4027

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4028

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

4029

# harness, residing in Google Container Registry.

4030

#

4031

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

4032

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4033

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4034

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

4035

# service will attempt to choose a reasonable default.

4036

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

4037

# are supported.

4038

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4039

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4040

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4041

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4042

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4043

# must be a disk type appropriate to the project and zone in which

4044

# the workers will run. If unknown or unspecified, the service

4045

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4046

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4047

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4048

# typically ending in "pd-standard". If SSD persistent disks are

4049

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4050

# actual valid values are defined the Google Compute Engine API,

4051

# not by the Cloud Dataflow API; consult the Google Compute Engine

4052

# documentation for more information about determining the set of

4053

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4054

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4055

# Google Compute Engine Disk types are local to a particular

4056

# project in a particular zone, and so the resource name will

4057

# typically look something like this:

4058

#

4059

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4060

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

4061

},

4062

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4063

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4064

# only be set in the Fn API path. For non-cross-language pipelines this

4065

# should have only one entry. Cross-language pipelines will have two or more

4066

# entries.

4067

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4068

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

4069

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4070

# container instance with this image. If false (or unset) recommends using

4071

# more than one core per SDK container instance with this image for

4072

# efficiency. Note that Dataflow service may choose to override this property

4073

# if needed.

4074

},

4075

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4076

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

4077

# the form "regions/REGION/subnetworks/SUBNETWORK".

4078

"ipConfiguration": "A String", # Configuration for VM IPs.

4079

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

4080

# using the standard Dataflow task runner. Users should ignore

4081

# this field.

4082

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

4083

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

4084

# taskrunner; e.g. "wheel".

4085

"harnessCommand": "A String", # The command to launch the worker harness.

4086

"logDir": "A String", # The directory on the VM to store logs.

4087

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

4088

# access the Cloud Dataflow API.

4089

"A String",

4090

],

4091

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

4092

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

4093

# will not be uploaded.

4094

#

4095

# The supported resource type is:

4096

#

4097

# Google Cloud Storage:

4098

# storage.googleapis.com/{bucket}/{object}

4099

# bucket.storage.googleapis.com/{object}

4100

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

4101

"workflowFileName": "A String", # The file to store the workflow in.

4102

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

4103

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

4104

# temporary storage.

4105

#

4106

# The supported resource type is:

4107

#

4108

# Google Cloud Storage:

4109

# storage.googleapis.com/{bucket}/{object}

4110

# bucket.storage.googleapis.com/{object}

4111

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

4112

"languageHint": "A String", # The suggested backend language.

4113

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

4114

#

4115

# When workers access Google Cloud APIs, they logically do so via

4116

# relative URLs. If this field is specified, it supplies the base

4117

# URL to use for resolving these relative URLs. The normative

4118

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4119

# Locators".

4120

#

4121

# If not specified, the default value is "http://www.googleapis.com/"

4122

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

4123

# console.

4124

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

4125

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

4126

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

4127

#

4128

# When workers access Google Cloud APIs, they logically do so via

4129

# relative URLs. If this field is specified, it supplies the base

4130

# URL to use for resolving these relative URLs. The normative

4131

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4132

# Locators".

4133

#

4134

# If not specified, the default value is "http://www.googleapis.com/"

4135

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

4136

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

4137

# "dataflow/v1b3/projects".

4138

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

4139

# "shuffle/v1beta1".

4140

"workerId": "A String", # The ID of the worker running this pipeline.

4141

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

4142

# storage.

4143

#

4144

# The supported resource type is:

4145

#

4146

# Google Cloud Storage:

4147

#

4148

# storage.googleapis.com/{bucket}/{object}

4149

# bucket.storage.googleapis.com/{object}

4150

},

4151

"vmId": "A String", # The ID string of the VM.

4152

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

4153

# taskrunner; e.g. "root".

4154

},

4155

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

4156

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

4157

"algorithm": "A String", # The algorithm to use for autoscaling.

4158

},

4159

"metadata": { # Metadata to set on the Google Compute Engine VMs.

4160

"a_key": "A String",

4161

},

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4162

},

4163

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4164

"dataset": "A String", # The dataset for the current project where various workflow

4165

# related tables are stored.

4166

#

4167

# The supported resource type is:

4168

#

4169

# Google BigQuery:

4170

# bigquery.googleapis.com/{dataset}

4171

"internalExperiments": { # Experimental settings.

4172

"a_key": "", # Properties of the object. Contains field @type with type URL.

4173

},

4174

"workerRegion": "A String", # The Compute Engine region

4175

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

4176

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

4177

# with worker_zone. If neither worker_region nor worker_zone is specified,

4178

# default to the control plane's region.

4179

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

4180

# at rest, AKA a Customer Managed Encryption Key (CMEK).

4181

#

4182

# Format:

4183

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

4184

"userAgent": { # A description of the process that generated the request.

4185

"a_key": "", # Properties of the object.

4186

},

4187

"workerZone": "A String", # The Compute Engine zone

4188

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

4189

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

4190

# with worker_region. If neither worker_region nor worker_zone is specified,

4191

# a zone in the control plane's region is chosen based on available capacity.

4192

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4193

# unspecified, the service will attempt to choose a reasonable

4194

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4195

# e.g. "compute.googleapis.com".

4196

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

4197

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4198

# this resource prefix, where {JOBNAME} is the value of the

4199

# job_name field. The resulting bucket and object prefix is used

4200

# as the prefix of the resources used to store temporary data

4201

# needed during the job execution. NOTE: This will override the

4202

# value in taskrunner_settings.

4203

# The supported resource type is:

4204

#

4205

# Google Cloud Storage:

4206

#

4207

# storage.googleapis.com/{bucket}/{object}

4208

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4209

"experiments": [ # The list of experiments to enable.

4210

"A String",

4211

],

4212

"version": { # A structure describing which components and their versions of the service

4213

# are required in order to run the job.

4214

"a_key": "", # Properties of the object.

4215

},

4216

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4217

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4218

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

4219

# callers cannot mutate it.

4220

{ # A message describing the state of a particular execution stage.

4221

"executionStageName": "A String", # The name of the execution stage.

4222

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

4223

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

4224

},

4225

],

4226

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

4227

# by the metadata values provided here. Populated for ListJobs and all GetJob

4228

# views SUMMARY and higher.

4229

# ListJob response and Job SUMMARY view.

4230

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

4231

{ # Metadata for a BigTable connector used by the job.

4232

"tableId": "A String", # TableId accessed in the connection.

4233

"projectId": "A String", # ProjectId accessed in the connection.

4234

"instanceId": "A String", # InstanceId accessed in the connection.

4235

},

4236

],

4237

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

4238

{ # Metadata for a Spanner connector used by the job.

4239

"databaseId": "A String", # DatabaseId accessed in the connection.

4240

"instanceId": "A String", # InstanceId accessed in the connection.

4241

"projectId": "A String", # ProjectId accessed in the connection.

4242

},

4243

],

4244

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

4245

{ # Metadata for a Datastore connector used by the job.

4246

"projectId": "A String", # ProjectId accessed in the connection.

4247

"namespace": "A String", # Namespace used in the connection.

4248

},

4249

],

4250

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

4251

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

4252

"sdkSupportStatus": "A String", # The support status for this SDK version.

4253

"version": "A String", # The version of the SDK used to run the job.

4254

},

4255

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

4256

{ # Metadata for a BigQuery connector used by the job.

4257

"table": "A String", # Table accessed in the connection.

4258

"dataset": "A String", # Dataset accessed in the connection.

4259

"projectId": "A String", # Project accessed in the connection.

4260

"query": "A String", # Query used to access data in the connection.

4261

},

4262

],

4263

"fileDetails": [ # Identification of a File source used in the Dataflow job.

4264

{ # Metadata for a File connector used by the job.

4265

"filePattern": "A String", # File Pattern used to access files by the connector.

4266

},

4267

],

4268

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

4269

{ # Metadata for a PubSub connector used by the job.

4270

"subscription": "A String", # Subscription used in the connection.

4271

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

4276

# snapshot.

4277

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

4278

"type": "A String", # The type of Cloud Dataflow job.

4279

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

4280

# A description of the user pipeline and stages through which it is executed.

4281

# Created by Cloud Dataflow service. Only retrieved with

4282

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

4283

# form. This data is provided by the Dataflow service for ease of visualizing

4284

# the pipeline and interpreting Dataflow provided metrics.

4285

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

4286

{ # Description of the composing transforms, names/ids, and input/outputs of a

4287

# stage of execution. Some composing transforms and sources may have been

4288

# generated by the Dataflow service during execution planning.

4289

"id": "A String", # Dataflow service generated id for this stage.

4290

"componentTransform": [ # Transforms that comprise this execution stage.

4291

{ # Description of a transform executed as part of an execution stage.

4292

"originalTransform": "A String", # User name for the original user transform with which this transform is

4293

# most closely associated.

4294

"name": "A String", # Dataflow service generated name for this source.

4295

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4296

},

4297

],

4298

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

4299

{ # Description of an interstitial value between transforms in an execution

4300

# stage.

4301

"name": "A String", # Dataflow service generated name for this source.

4302

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4303

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4304

# source is most closely associated.

4305

},

4306

],

4307

"kind": "A String", # Type of tranform this stage is executing.

4308

"outputSource": [ # Output sources for this stage.

4309

{ # Description of an input or output of an execution stage.

4310

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4311

# source is most closely associated.

4312

"name": "A String", # Dataflow service generated name for this source.

4313

"sizeBytes": "A String", # Size of the source, if measurable.

4314

"userName": "A String", # Human-readable name for this source; may be user or system generated.

4315

},

4316

],

4317

"name": "A String", # Dataflow service generated name for this stage.

4318

"inputSource": [ # Input sources for this stage.

4319

{ # Description of an input or output of an execution stage.

4320

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4321

# source is most closely associated.

4322

"name": "A String", # Dataflow service generated name for this source.

4323

"sizeBytes": "A String", # Size of the source, if measurable.

4324

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

4330

{ # Description of the type, names/ids, and input/outputs for a transform.

4331

"kind": "A String", # Type of transform.

4332

"inputCollectionName": [ # User names for all collection inputs to this transform.

4333

"A String",

4334

],

4335

"name": "A String", # User provided name for this transform instance.

4336

"id": "A String", # SDK generated id of this transform instance.

4337

"displayData": [ # Transform-specific display data.

4338

{ # Data provided with a pipeline or transform to provide descriptive info.

4339

"timestampValue": "A String", # Contains value if the data is of timestamp type.

4340

"boolValue": True or False, # Contains value if the data is of a boolean type.

4341

"javaClassValue": "A String", # Contains value if the data is of java class type.

4342

"strValue": "A String", # Contains value if the data is of string type.

4343

"int64Value": "A String", # Contains value if the data is of int64 type.

4344

"durationValue": "A String", # Contains value if the data is of duration type.

4345

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

4346

# language namespace (i.e. python module) which defines the display data.

4347

# This allows a dax monitoring system to specially handle the data

4348

# and perform custom rendering.

4349

"floatValue": 3.14, # Contains value if the data is of float type.

4350

"key": "A String", # The key identifying the display data.

4351

# This is intended to be used as a label for the display data

4352

# when viewed in a dax monitoring system.

4353

"shortStrValue": "A String", # A possible additional shorter value to display.

4354

# For example a java_class_name_value of com.mypackage.MyDoFn

4355

# will be stored with MyDoFn as the short_str_value and

4356

# com.mypackage.MyDoFn as the java_class_name value.

4357

# short_str_value can be displayed and java_class_name_value

4358

# will be displayed as a tooltip.

4359

"url": "A String", # An optional full URL.

4360

"label": "A String", # An optional label to display in a dax UI for the element.

4361

},

4362

],

4363

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

4369

{ # Data provided with a pipeline or transform to provide descriptive info.

4370

"timestampValue": "A String", # Contains value if the data is of timestamp type.

4371

"boolValue": True or False, # Contains value if the data is of a boolean type.

4372

"javaClassValue": "A String", # Contains value if the data is of java class type.

4373

"strValue": "A String", # Contains value if the data is of string type.

4374

"int64Value": "A String", # Contains value if the data is of int64 type.

4375

"durationValue": "A String", # Contains value if the data is of duration type.

4376

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

4377

# language namespace (i.e. python module) which defines the display data.

4378

# This allows a dax monitoring system to specially handle the data

4379

# and perform custom rendering.

4380

"floatValue": 3.14, # Contains value if the data is of float type.

4381

"key": "A String", # The key identifying the display data.

4382

# This is intended to be used as a label for the display data

4383

# when viewed in a dax monitoring system.

4384

"shortStrValue": "A String", # A possible additional shorter value to display.

4385

# For example a java_class_name_value of com.mypackage.MyDoFn

4386

# will be stored with MyDoFn as the short_str_value and

4387

# com.mypackage.MyDoFn as the java_class_name value.

4388

# short_str_value can be displayed and java_class_name_value

4389

# will be displayed as a tooltip.

4390

"url": "A String", # An optional full URL.

4391

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

4396

# of the job it replaced.

4397

#

4398

# When sending a `CreateJobRequest`, you can update a job by specifying it

4399

# here. The job named here is stopped, and its intermediate state is

4400

# transferred to this job.

4401

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4402

# for temporary storage. These temporary files will be

4403

# removed on job completion.

4404

# No duplicates are allowed.

4405

# No file patterns are supported.

4406

#

4407

# The supported files are:

4408

#

4409

# Google Cloud Storage:

4410

#

4411

# storage.googleapis.com/{bucket}/{object}

4412

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4413

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4414

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4415

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4416

#

4417

# Only one Job with a given name may exist in a project at any

4418

# given time. If a caller attempts to create a Job with the same

4419

# name as an already-existing Job, the attempt returns the

4420

# existing Job.

4421

#

4422

# The name must match the regular expression

4423

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4424

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4425

#

4426

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4427

{ # Defines a particular step within a Cloud Dataflow job.

4428

#

4429

# A job consists of multiple steps, each of which performs some

4430

# specific operation as part of the overall job. Data is typically

4431

# passed from one step to another as part of the job.

4432

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4433

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4434

# Map-Reduce job:

4435

#

4436

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4437

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4438

#

4439

# * Validate the elements.

4440

#

4441

# * Apply a user-defined function to map each element to some value

4442

# and extract an element-specific key value.

4443

#

4444

# * Group elements with the same key into a single element with

4445

# that key, transforming a multiply-keyed collection into a

4446

# uniquely-keyed collection.

4447

#

4448

# * Write the elements out to some data sink.

4449

#

4450

# Note that the Cloud Dataflow service may be used to run many different

4451

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4452

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4453

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4454

"kind": "A String", # The kind of step in the Cloud Dataflow job.

4455

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4456

# predefined step has its own required set of properties.

4457

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4458

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4459

},

4460

},

4461

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4462

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

4463

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

4464

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

4465

# isn't contained in the submitted job.

4466

"stages": { # A mapping from each stage to the information about that stage.

4467

"a_key": { # Contains information about how a particular

4468

# google.dataflow.v1beta3.Step will be executed.

4469

"stepName": [ # The steps associated with the execution stage.

4470

# Note that stages may have several steps, and that a given step

4471

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4478

#

4479

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

4480

# specified.

4481

#

4482

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

4483

# terminal state. After a job has reached a terminal state, no

4484

# further state updates may be made.

4485

#

4486

# This field may be mutated by the Cloud Dataflow service;

4487

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4488

"location": "A String", # The [regional endpoint]

4489

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

4490

# contains this job.

4491

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

4492

# Flexible resource scheduling jobs are started with some delay after job

4493

# creation, so start_time is unset before start and is updated when the

4494

# job is started by the Cloud Dataflow service. For other jobs, start_time

4495

# always equals to create_time and is immutable and set by the Cloud Dataflow

4496

# service.

4497

"stepsLocation": "A String", # The GCS location where the steps are stored.

4498

"labels": { # User-defined labels for this job.

4499

#

4500

# The labels map can contain no more than 64 entries. Entries of the labels

4501

# map are UTF8 strings that comply with the following restrictions:

4502

#

4503

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

4504

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

4505

# * Both keys and values are additionally constrained to be <= 128 bytes in

4506

# size.

4507

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

4508

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

4509

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

4510

# Cloud Dataflow service.

4511

"requestedState": "A String", # The job's requested state.

4512

#

4513

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

4514

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

4515

# also be used to directly set a job's requested state to

4516

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

4517

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4518

}</pre>

Nathaniel Manista