blob: 5ec52a7e0b8562df23ec32c65d9e87df622e1fde [file] [log] [blame]
<html><body>
<style>
body, h1, h2, h3, div, span, p, pre, a {
margin: 0;
padding: 0;
border: 0;
font-weight: inherit;
font-style: inherit;
font-size: 100%;
font-family: inherit;
vertical-align: baseline;
}
body {
font-size: 13px;
padding: 1em;
}
h1 {
font-size: 26px;
margin-bottom: 1em;
}
h2 {
font-size: 24px;
margin-bottom: 1em;
}
h3 {
font-size: 20px;
margin-bottom: 1em;
margin-top: 1em;
}
pre, code {
line-height: 1.5;
font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
}
pre {
margin-top: 0.5em;
}
h1, h2, h3, p {
font-family: Arial, sans serif;
}
h1, h2, h3 {
border-bottom: solid #CCC 1px;
}
.toc_element {
margin-top: 0.5em;
}
.firstline {
margin-left: 2 em;
}
.method {
margin-top: 1em;
border: solid 1px #CCC;
padding: 1em;
background: #EEE;
}
.details {
font-weight: bold;
font-size: 14px;
}
</style>
<h1><a href="genomics_v1.html">Genomics API</a> . <a href="genomics_v1.reads.html">reads</a></h1>
<h2>Instance Methods</h2>
<p class="toc_element">
<code><a href="#search">search(body, x__xgafv=None)</a></code></p>
<p class="firstline">Gets a list of reads for one or more read group sets. For the definitions of read group sets and other genomics resources, see [Fundamentals of Google Genomics](https://cloud.google.com/genomics/fundamentals-of-google-genomics) Reads search operates over a genomic coordinate space of reference sequence & position defined over the reference sequences to which the requested read group sets are aligned. If a target positional range is specified, search returns all reads whose alignment to the reference genome overlap the range. A query which specifies only read group set IDs yields all reads in those read group sets, including unmapped reads. All reads returned (including reads on subsequent pages) are ordered by genomic coordinate (by reference sequence, then position). Reads with equivalent genomic coordinates are returned in an unspecified order. This order is consistent, such that two queries for the same content (regardless of page size) yield reads in the same order across their respective streams of paginated responses. Implements [GlobalAllianceApi.searchReads](https://github.com/ga4gh/schemas/blob/v0.5.1/src/main/resources/avro/readmethods.avdl#L85).</p>
<p class="toc_element">
<code><a href="#stream">stream(body, x__xgafv=None)</a></code></p>
<p class="firstline">Returns a stream of all the reads matching the search request, ordered by reference name, position, and ID.</p>
<h3>Method Details</h3>
<div class="method">
<code class="details" id="search">search(body, x__xgafv=None)</code>
<pre>Gets a list of reads for one or more read group sets. For the definitions of read group sets and other genomics resources, see [Fundamentals of Google Genomics](https://cloud.google.com/genomics/fundamentals-of-google-genomics) Reads search operates over a genomic coordinate space of reference sequence & position defined over the reference sequences to which the requested read group sets are aligned. If a target positional range is specified, search returns all reads whose alignment to the reference genome overlap the range. A query which specifies only read group set IDs yields all reads in those read group sets, including unmapped reads. All reads returned (including reads on subsequent pages) are ordered by genomic coordinate (by reference sequence, then position). Reads with equivalent genomic coordinates are returned in an unspecified order. This order is consistent, such that two queries for the same content (regardless of page size) yield reads in the same order across their respective streams of paginated responses. Implements [GlobalAllianceApi.searchReads](https://github.com/ga4gh/schemas/blob/v0.5.1/src/main/resources/avro/readmethods.avdl#L85).
Args:
body: object, The request body. (required)
The object takes the form of:
{ # The read search request.
"end": "A String", # The end position of the range on the reference, 0-based exclusive. If specified, `referenceName` must also be specified.
"readGroupIds": [ # The IDs of the read groups within which to search for reads. All specified read groups must belong to the same read group sets. Must specify one of `readGroupSetIds` or `readGroupIds`.
"A String",
],
"pageSize": 42, # The maximum number of results to return in a single page. If unspecified, defaults to 256. The maximum value is 2048.
"pageToken": "A String", # The continuation token, which is used to page through large result sets. To get the next page of results, set this parameter to the value of `nextPageToken` from the previous response.
"start": "A String", # The start position of the range on the reference, 0-based inclusive. If specified, `referenceName` must also be specified.
"referenceName": "A String", # The reference sequence name, for example `chr1`, `1`, or `chrX`. If set to `*`, only unmapped reads are returned. If unspecified, all reads (mapped and unmapped) are returned.
"readGroupSetIds": [ # The IDs of the read groups sets within which to search for reads. All specified read group sets must be aligned against a common set of reference sequences; this defines the genomic coordinates for the query. Must specify one of `readGroupSetIds` or `readGroupIds`.
"A String",
],
}
x__xgafv: string, V1 error format.
Returns:
An object of the form:
{ # The read search response.
"nextPageToken": "A String", # The continuation token, which is used to page through large result sets. Provide this value in a subsequent request to return the next page of results. This field will be empty if there aren't any additional results.
"alignments": [ # The list of matching alignments sorted by mapped genomic coordinate, if any, ascending in position within the same reference. Unmapped reads, which have no position, are returned contiguously and are sorted in ascending lexicographic order by fragment name.
{ # A read alignment describes a linear alignment of a string of DNA to a reference sequence, in addition to metadata about the fragment (the molecule of DNA sequenced) and the read (the bases which were read by the sequencer). A read is equivalent to a line in a SAM file. A read belongs to exactly one read group and exactly one read group set. For more genomics resource definitions, see [Fundamentals of Google Genomics](https://cloud.google.com/genomics/fundamentals-of-google-genomics) ### Reverse-stranded reads Mapped reads (reads having a non-null `alignment`) can be aligned to either the forward or the reverse strand of their associated reference. Strandedness of a mapped read is encoded by `alignment.position.reverseStrand`. If we consider the reference to be a forward-stranded coordinate space of `[0, reference.length)` with `0` as the left-most position and `reference.length` as the right-most position, reads are always aligned left to right. That is, `alignment.position.position` always refers to the left-most reference coordinate and `alignment.cigar` describes the alignment of this read to the reference from left to right. All per-base fields such as `alignedSequence` and `alignedQuality` share this same left-to-right orientation; this is true of reads which are aligned to either strand. For reverse-stranded reads, this means that `alignedSequence` is the reverse complement of the bases that were originally reported by the sequencing machine. ### Generating a reference-aligned sequence string When interacting with mapped reads, it's often useful to produce a string representing the local alignment of the read to reference. The following pseudocode demonstrates one way of doing this: out = "" offset = 0 for c in read.alignment.cigar { switch c.operation { case "ALIGNMENT_MATCH", "SEQUENCE_MATCH", "SEQUENCE_MISMATCH": out += read.alignedSequence[offset:offset+c.operationLength] offset += c.operationLength break case "CLIP_SOFT", "INSERT": offset += c.operationLength break case "PAD": out += repeat("*", c.operationLength) break case "DELETE": out += repeat("-", c.operationLength) break case "SKIP": out += repeat(" ", c.operationLength) break case "CLIP_HARD": break } } return out ### Converting to SAM's CIGAR string The following pseudocode generates a SAM CIGAR string from the `cigar` field. Note that this is a lossy conversion (`cigar.referenceSequence` is lost). cigarMap = { "ALIGNMENT_MATCH": "M", "INSERT": "I", "DELETE": "D", "SKIP": "N", "CLIP_SOFT": "S", "CLIP_HARD": "H", "PAD": "P", "SEQUENCE_MATCH": "=", "SEQUENCE_MISMATCH": "X", } cigarStr = "" for c in read.alignment.cigar { cigarStr += c.operationLength + cigarMap[c.operation] } return cigarStr
"info": { # A map of additional read alignment information. This must be of the form map (string key mapping to a list of string values).
"a_key": [
"",
],
},
"duplicateFragment": True or False, # The fragment is a PCR or optical duplicate (SAM flag 0x400).
"nextMatePosition": { # An abstraction for referring to a genomic position, in relation to some already known reference. For now, represents a genomic position as a reference name, a base number on that reference (0-based), and a determination of forward or reverse strand. # The mapping of the primary alignment of the `(readNumber+1)%numberReads` read in the fragment. It replaces mate position and mate strand in SAM.
"position": "A String", # The 0-based offset from the start of the forward strand for that reference.
"reverseStrand": True or False, # Whether this position is on the reverse strand, as opposed to the forward strand.
"referenceName": "A String", # The name of the reference in whatever reference set is being used.
},
"readGroupSetId": "A String", # The ID of the read group set this read belongs to. A read belongs to exactly one read group set.
"numberReads": 42, # The number of reads in the fragment (extension to SAM flag 0x1).
"failedVendorQualityChecks": True or False, # Whether this read did not pass filters, such as platform or vendor quality controls (SAM flag 0x200).
"fragmentName": "A String", # The fragment name. Equivalent to QNAME (query template name) in SAM.
"readNumber": 42, # The read number in sequencing. 0-based and less than numberReads. This field replaces SAM flag 0x40 and 0x80.
"properPlacement": True or False, # The orientation and the distance between reads from the fragment are consistent with the sequencing protocol (SAM flag 0x2).
"readGroupId": "A String", # The ID of the read group this read belongs to. A read belongs to exactly one read group. This is a server-generated ID which is distinct from SAM's RG tag (for that value, see ReadGroup.name).
"supplementaryAlignment": True or False, # Whether this alignment is supplementary. Equivalent to SAM flag 0x800. Supplementary alignments are used in the representation of a chimeric alignment. In a chimeric alignment, a read is split into multiple linear alignments that map to different reference contigs. The first linear alignment in the read will be designated as the representative alignment; the remaining linear alignments will be designated as supplementary alignments. These alignments may have different mapping quality scores. In each linear alignment in a chimeric alignment, the read will be hard clipped. The `alignedSequence` and `alignedQuality` fields in the alignment record will only represent the bases for its respective linear alignment.
"alignedQuality": [ # The quality of the read sequence contained in this alignment record (equivalent to QUAL in SAM). `alignedSequence` and `alignedQuality` may be shorter than the full read sequence and quality. This will occur if the alignment is part of a chimeric alignment, or if the read was trimmed. When this occurs, the CIGAR for this read will begin/end with a hard clip operator that will indicate the length of the excised sequence.
42,
],
"fragmentLength": 42, # The observed length of the fragment, equivalent to TLEN in SAM.
"alignedSequence": "A String", # The bases of the read sequence contained in this alignment record, **without CIGAR operations applied** (equivalent to SEQ in SAM). `alignedSequence` and `alignedQuality` may be shorter than the full read sequence and quality. This will occur if the alignment is part of a chimeric alignment, or if the read was trimmed. When this occurs, the CIGAR for this read will begin/end with a hard clip operator that will indicate the length of the excised sequence.
"id": "A String", # The server-generated read ID, unique across all reads. This is different from the `fragmentName`.
"alignment": { # A linear alignment can be represented by one CIGAR string. Describes the mapped position and local alignment of the read to the reference. # The linear alignment for this alignment record. This field is null for unmapped reads.
"position": { # An abstraction for referring to a genomic position, in relation to some already known reference. For now, represents a genomic position as a reference name, a base number on that reference (0-based), and a determination of forward or reverse strand. # The position of this alignment.
"position": "A String", # The 0-based offset from the start of the forward strand for that reference.
"reverseStrand": True or False, # Whether this position is on the reverse strand, as opposed to the forward strand.
"referenceName": "A String", # The name of the reference in whatever reference set is being used.
},
"cigar": [ # Represents the local alignment of this sequence (alignment matches, indels, etc) against the reference.
{ # A single CIGAR operation.
"referenceSequence": "A String", # `referenceSequence` is only used at mismatches (`SEQUENCE_MISMATCH`) and deletions (`DELETE`). Filling this field replaces SAM's MD tag. If the relevant information is not available, this field is unset.
"operation": "A String",
"operationLength": "A String", # The number of genomic bases that the operation runs for. Required.
},
],
"mappingQuality": 42, # The mapping quality of this alignment. Represents how likely the read maps to this position as opposed to other locations. Specifically, this is -10 log10 Pr(mapping position is wrong), rounded to the nearest integer.
},
"secondaryAlignment": True or False, # Whether this alignment is secondary. Equivalent to SAM flag 0x100. A secondary alignment represents an alternative to the primary alignment for this read. Aligners may return secondary alignments if a read can map ambiguously to multiple coordinates in the genome. By convention, each read has one and only one alignment where both `secondaryAlignment` and `supplementaryAlignment` are false.
},
],
}</pre>
</div>
<div class="method">
<code class="details" id="stream">stream(body, x__xgafv=None)</code>
<pre>Returns a stream of all the reads matching the search request, ordered by reference name, position, and ID.
Args:
body: object, The request body. (required)
The object takes the form of:
{ # The stream reads request.
"end": "A String", # The end position of the range on the reference, 0-based exclusive. If specified, `referenceName` must also be specified.
"totalShards": 42, # Specifying `totalShards` causes a disjoint subset of the normal response payload to be returned for each query with a unique `shard` parameter specified. A best effort is made to yield equally sized shards. Sharding can be used to distribute processing amongst workers, where each worker is assigned a unique `shard` number and all workers specify the same `totalShards` number. The union of reads returned for all sharded queries `[0, totalShards)` is equal to those returned by a single unsharded query. Queries for different values of `totalShards` with common divisors will share shard boundaries. For example, streaming `shard` 2 of 5 `totalShards` yields the same results as streaming `shard`s 4 and 5 of 10 `totalShards`. This property can be leveraged for adaptive retries.
"readGroupSetId": "A String", # The ID of the read group set from which to stream reads.
"projectId": "A String", # The Google Developers Console project ID or number which will be billed for this access. The caller must have WRITE access to this project. Required.
"shard": 42, # Restricts results to a shard containing approximately `1/totalShards` of the normal response payload for this query. Results from a sharded request are disjoint from those returned by all queries which differ only in their shard parameter. A shard may yield 0 results; this is especially likely for large values of `totalShards`. Valid values are `[0, totalShards)`.
"start": "A String", # The start position of the range on the reference, 0-based inclusive. If specified, `referenceName` must also be specified.
"referenceName": "A String", # The reference sequence name, for example `chr1`, `1`, or `chrX`. If set to *, only unmapped reads are returned.
}
x__xgafv: string, V1 error format.
Returns:
An object of the form:
{
"alignments": [
{ # A read alignment describes a linear alignment of a string of DNA to a reference sequence, in addition to metadata about the fragment (the molecule of DNA sequenced) and the read (the bases which were read by the sequencer). A read is equivalent to a line in a SAM file. A read belongs to exactly one read group and exactly one read group set. For more genomics resource definitions, see [Fundamentals of Google Genomics](https://cloud.google.com/genomics/fundamentals-of-google-genomics) ### Reverse-stranded reads Mapped reads (reads having a non-null `alignment`) can be aligned to either the forward or the reverse strand of their associated reference. Strandedness of a mapped read is encoded by `alignment.position.reverseStrand`. If we consider the reference to be a forward-stranded coordinate space of `[0, reference.length)` with `0` as the left-most position and `reference.length` as the right-most position, reads are always aligned left to right. That is, `alignment.position.position` always refers to the left-most reference coordinate and `alignment.cigar` describes the alignment of this read to the reference from left to right. All per-base fields such as `alignedSequence` and `alignedQuality` share this same left-to-right orientation; this is true of reads which are aligned to either strand. For reverse-stranded reads, this means that `alignedSequence` is the reverse complement of the bases that were originally reported by the sequencing machine. ### Generating a reference-aligned sequence string When interacting with mapped reads, it's often useful to produce a string representing the local alignment of the read to reference. The following pseudocode demonstrates one way of doing this: out = "" offset = 0 for c in read.alignment.cigar { switch c.operation { case "ALIGNMENT_MATCH", "SEQUENCE_MATCH", "SEQUENCE_MISMATCH": out += read.alignedSequence[offset:offset+c.operationLength] offset += c.operationLength break case "CLIP_SOFT", "INSERT": offset += c.operationLength break case "PAD": out += repeat("*", c.operationLength) break case "DELETE": out += repeat("-", c.operationLength) break case "SKIP": out += repeat(" ", c.operationLength) break case "CLIP_HARD": break } } return out ### Converting to SAM's CIGAR string The following pseudocode generates a SAM CIGAR string from the `cigar` field. Note that this is a lossy conversion (`cigar.referenceSequence` is lost). cigarMap = { "ALIGNMENT_MATCH": "M", "INSERT": "I", "DELETE": "D", "SKIP": "N", "CLIP_SOFT": "S", "CLIP_HARD": "H", "PAD": "P", "SEQUENCE_MATCH": "=", "SEQUENCE_MISMATCH": "X", } cigarStr = "" for c in read.alignment.cigar { cigarStr += c.operationLength + cigarMap[c.operation] } return cigarStr
"info": { # A map of additional read alignment information. This must be of the form map (string key mapping to a list of string values).
"a_key": [
"",
],
},
"duplicateFragment": True or False, # The fragment is a PCR or optical duplicate (SAM flag 0x400).
"nextMatePosition": { # An abstraction for referring to a genomic position, in relation to some already known reference. For now, represents a genomic position as a reference name, a base number on that reference (0-based), and a determination of forward or reverse strand. # The mapping of the primary alignment of the `(readNumber+1)%numberReads` read in the fragment. It replaces mate position and mate strand in SAM.
"position": "A String", # The 0-based offset from the start of the forward strand for that reference.
"reverseStrand": True or False, # Whether this position is on the reverse strand, as opposed to the forward strand.
"referenceName": "A String", # The name of the reference in whatever reference set is being used.
},
"readGroupSetId": "A String", # The ID of the read group set this read belongs to. A read belongs to exactly one read group set.
"numberReads": 42, # The number of reads in the fragment (extension to SAM flag 0x1).
"failedVendorQualityChecks": True or False, # Whether this read did not pass filters, such as platform or vendor quality controls (SAM flag 0x200).
"fragmentName": "A String", # The fragment name. Equivalent to QNAME (query template name) in SAM.
"readNumber": 42, # The read number in sequencing. 0-based and less than numberReads. This field replaces SAM flag 0x40 and 0x80.
"properPlacement": True or False, # The orientation and the distance between reads from the fragment are consistent with the sequencing protocol (SAM flag 0x2).
"readGroupId": "A String", # The ID of the read group this read belongs to. A read belongs to exactly one read group. This is a server-generated ID which is distinct from SAM's RG tag (for that value, see ReadGroup.name).
"supplementaryAlignment": True or False, # Whether this alignment is supplementary. Equivalent to SAM flag 0x800. Supplementary alignments are used in the representation of a chimeric alignment. In a chimeric alignment, a read is split into multiple linear alignments that map to different reference contigs. The first linear alignment in the read will be designated as the representative alignment; the remaining linear alignments will be designated as supplementary alignments. These alignments may have different mapping quality scores. In each linear alignment in a chimeric alignment, the read will be hard clipped. The `alignedSequence` and `alignedQuality` fields in the alignment record will only represent the bases for its respective linear alignment.
"alignedQuality": [ # The quality of the read sequence contained in this alignment record (equivalent to QUAL in SAM). `alignedSequence` and `alignedQuality` may be shorter than the full read sequence and quality. This will occur if the alignment is part of a chimeric alignment, or if the read was trimmed. When this occurs, the CIGAR for this read will begin/end with a hard clip operator that will indicate the length of the excised sequence.
42,
],
"fragmentLength": 42, # The observed length of the fragment, equivalent to TLEN in SAM.
"alignedSequence": "A String", # The bases of the read sequence contained in this alignment record, **without CIGAR operations applied** (equivalent to SEQ in SAM). `alignedSequence` and `alignedQuality` may be shorter than the full read sequence and quality. This will occur if the alignment is part of a chimeric alignment, or if the read was trimmed. When this occurs, the CIGAR for this read will begin/end with a hard clip operator that will indicate the length of the excised sequence.
"id": "A String", # The server-generated read ID, unique across all reads. This is different from the `fragmentName`.
"alignment": { # A linear alignment can be represented by one CIGAR string. Describes the mapped position and local alignment of the read to the reference. # The linear alignment for this alignment record. This field is null for unmapped reads.
"position": { # An abstraction for referring to a genomic position, in relation to some already known reference. For now, represents a genomic position as a reference name, a base number on that reference (0-based), and a determination of forward or reverse strand. # The position of this alignment.
"position": "A String", # The 0-based offset from the start of the forward strand for that reference.
"reverseStrand": True or False, # Whether this position is on the reverse strand, as opposed to the forward strand.
"referenceName": "A String", # The name of the reference in whatever reference set is being used.
},
"cigar": [ # Represents the local alignment of this sequence (alignment matches, indels, etc) against the reference.
{ # A single CIGAR operation.
"referenceSequence": "A String", # `referenceSequence` is only used at mismatches (`SEQUENCE_MISMATCH`) and deletions (`DELETE`). Filling this field replaces SAM's MD tag. If the relevant information is not available, this field is unset.
"operation": "A String",
"operationLength": "A String", # The number of genomic bases that the operation runs for. Required.
},
],
"mappingQuality": 42, # The mapping quality of this alignment. Represents how likely the read maps to this position as opposed to other locations. Specifically, this is -10 log10 Pr(mapping position is wrong), rounded to the nearest integer.
},
"secondaryAlignment": True or False, # Whether this alignment is secondary. Equivalent to SAM flag 0x100. A secondary alignment represents an alternative to the primary alignment for this read. Aligners may return secondary alignments if a read can map ambiguously to multiple coordinates in the genome. By convention, each read has one and only one alignment where both `secondaryAlignment` and `supplementaryAlignment` are false.
},
],
}</pre>
</div>
</body></html>