core/ndarray-1.0.0

An n-dimensional array.

Description

There are two ways to store the data in an ndarray.

  • Inline in the tree: This is recommended only for small arrays. In this case, the entire ndarray tag may be a nested list, in which case the type of the array is inferred from the content. (See the rules for type inference in the inline-data definition below.) The inline data may also be given in the data property, in which case it is possible to explicitly specify the datatype and other properties.

  • External to the tree: The data comes from a block within the same ASDF file or an external ASDF file referenced by a URI.

Outline

Schema Definitions

This node must validate against any of the following:

  • This type is an object with the following properties:
    • source
      object
      The source of the data.

      • If an integer: If positive, the zero-based index of the block within the same file. If negative, the index from the last block within the same file. For example, a source of -1 corresponds to the last block in the same file.

      • If a string, a URI to an external ASDF file containing the block data. Relative URIs and file: and http: protocols must be supported. Other protocols may be supported by specific library implementations.

      The ability to reference block data in an external ASDF file is intentionally limited to the first block in the external ASDF file, and is intended only to support the needs of exploded. For the more general case of referencing data in an external ASDF file, use tree references.

      This node must validate against any of the following:

      • integer
      • string
        No length restriction
    • data
      inline-data
      The data for the array inline.

      If datatype and/or shape are also provided, they must match the data here and can be used as a consistency check. strides, offset and byteorder are meaningless when data is provided.

    • shape
      array
      The shape of the array.

      The first entry may be the string *, indicating that the length of the first index of the array will be automatically determined from the size of the block. This is used for streaming support.

      No length restriction

      Items in the array must be any of the following types:

      • integer
        Minimum value: 0
      • This node has no type definition (unrestricted)
    • datatype
      datatype
      The data format of the array elements.

    • byteorder
      string
      The byte order (big- or little-endian) of the array data.

      No length restriction
      Only the following values are valid for this node:
      • big

      • little

    • offset
      integer
      The offset, in bytes, within the data for this start of this view.

      Minimum value: 0
      Default value: 0
    • strides
      array
      The number of bytes to skip in each dimension. If not provided, the array is assumed by be contiguous and in C order. If provided, must be the same length as the shape property.

      No length restriction

      Items in the array must be any of the following types:

      • integer
        Minimum value: 1
      • integer
        Maximum value: -1
    • mask
      object
      Describes how missing values in the array are stored. If a scalar number, that number is used to represent missing values. If an ndarray, the given array provides a mask, where non-zero values represent missing values in this array. The mask array must be broadcastable to the dimensions of this array.

      This node must validate against any of the following:

      • number
      • This node must validate against all of the following:

Examples

An inline array, with implicit data type:

!core/ndarray-1.0.0
  [[1, 0, 0],
   [0, 1, 0],
   [0, 0, 1]]

An inline array, with an explicit data type:

!core/ndarray-1.0.0
  datatype: float64
  data:
    [[1, 0, 0],
     [0, 1, 0],
     [0, 0, 1]]

An inline structured array, where the types of each column are automatically detected:

!core/ndarray-1.0.0
  [[M110, 110, 205, And],
   [ M31,  31, 224, And],
   [ M32,  32, 221, And],
   [M103, 103, 581, Cas]]

An inline structured array, where the types of each column are explicitly specified:

!core/ndarray-1.0.0
  datatype: [['ascii', 4], uint16, uint16, ['ascii', 4]]
  data:
    [[M110, 110, 205, And],
     [ M31,  31, 224, And],
     [ M32,  32, 221, And],
     [M103, 103, 581, Cas]]

A double-precision array, in contiguous memory in a block within the same file:

!core/ndarray-1.0.0
  source: 0
  shape: [1024, 1024]
  datatype: float64
  byteorder: little

A view of a tile in that image:

!core/ndarray-1.0.0
  source: 0
  shape: [256, 256]
  datatype: float64
  byteorder: little
  strides: [8192, 8]
  offset: 2099200

A structured datatype, with nested columns for a coordinate in (ra, dec), and a 3x3 convolution kernel:

!core/ndarray-1.0.0
  source: 0
  shape: [64]
  datatype:
    - name: coordinate
      datatype:
        - name: ra
          datatype: float64
        - name: dec
          datatype: float64
    - name: kernel
      datatype: float32
      shape: [3, 3]
  byteorder: little

An array in Fortran order:

!core/ndarray-1.0.0
  source: 0
  shape: [1024, 1024]
  datatype: float64
  byteorder: little
  strides: [8192, 8]

An array where values of -999 are treated as missing:

!core/ndarray-1.0.0
  source: 0
  shape: [256, 256]
  datatype: float64
  byteorder: little
  mask: -999

An array where another array is used as a mask:

!core/ndarray-1.0.0
  source: 0
  shape: [256, 256]
  datatype: float64
  byteorder: little
  mask: !core/ndarray-1.0.0
    source: 1
    shape: [256, 256]
    datatype: bool8
    byteorder: little

An array where the data is stored in the first block in another ASDF file.:

!core/ndarray-1.0.0
  source: external.asdf
  shape: [256, 256]
  datatype: float64
  byteorder: little

Internal Definitions

  • scalar-datatype
    object
    Describes the type of a single element.

    There is a set of numeric types, each with a single identifier:

    • int8, int16, int32, int64: Signed integer types, with the given bit size.

    • uint8, uint16, uint32, uint64: Unsigned integer types, with the given bit size.

    • float32: Single-precision floating-point type or “binary32”, as defined in IEEE 754.

    • float64: Double-precision floating-point type or “binary64”, as defined in IEEE 754.

    • complex64: Complex number where the real and imaginary parts are each single-precision floating-point (“binary32”) numbers, as defined in IEEE 754.

    • complex128: Complex number where the real and imaginary parts are each double-precision floating-point (“binary64”) numbers, as defined in IEEE 754.

    There are two distinct fixed-length string types, which must be indicated with a 2-element array where the first element is an identifier for the string type, and the second is a length:

    • ascii: A string containing ASCII text (all codepoints < 128), where each character is 1 byte.

    • ucs4: A string containing unicode text in the UCS-4 encoding, where each character is always 4 bytes long. Here the number of bytes used is 4 times the given length.

    This node must validate against any of the following:

    • string
      No length restriction
      Only the following values are valid for this node:
      • int8

      • uint8

      • int16

      • uint16

      • int32

      • uint32

      • int64

      • uint64

      • float32

      • float64

      • complex64

      • complex128

      • bool8

    • array
      No length restriction
      The first 2 items in the list must be the following types:
        string
        No length restriction
        Only the following values are valid for this node:
        • ascii

        • ucs4

        integer
        Minimum value: 0
  • datatype
    object
    The data format of the array elements. May be a single scalar datatype, or may be a nested list of datatypes. When a list, each field may have a name.

    This node must validate against any of the following:

    • array
      No length restriction

      Items in the array must be any of the following types:

      • This type is an object with the following properties:
        • name
          string
          The name of the field

          No length restriction
          Must match the following pattern:
          [A-Za-z_][A-Za-z0-9_]*
          
        • datatype
          datatypeRequired
        • byteorder
          string
          The byteorder for the field. If not provided, the byteorder of the datatype as a whole will be used.

          No length restriction
          Only the following values are valid for this node:
          • big

          • little

        • shape
          array
          No length restriction
          Items in the array are restricted to the following types:
          integer
          Minimum value: 0
  • inline-data
    array
    Inline data is stored in YAML format directly in the tree, rather than referencing a binary block. It is made out of nested lists.

    If the datatype of the array is not specified, it is inferred from the array contents. Type inference is supported only for homogeneous arrays, not tables.

    • If any of the elements in the array are YAML strings, the datatype of the entire array is ucs4, with the width of the largest string in the column, otherwise…

    • If any of the elements in the array are complex numbers, the datatype of the entire column is complex128, otherwise…

    • If any of the types in the column are numbers with a decimal point, the datatype of the entire column is float64, otherwise..

    • If any of the types in the column are integers, the datatype of the entire column is int64, otherwise…

    • The datatype of the entire column is bool8.

    Masked values may be included in the array using null. If an explicit mask array is also provided, it takes precedence.

    No length restriction

    Items in the array must be any of the following types:

  • Original Schema

    %YAML 1.1
    ---
    $schema: "http://stsci.edu/schemas/yaml-schema/draft-01"
    id: "http://stsci.edu/schemas/asdf/core/ndarray-1.0.0"
    tag: "tag:stsci.edu:asdf/core/ndarray-1.0.0"
    
    title: >
      An *n*-dimensional array.
    
    description: |
      There are two ways to store the data in an ndarray.
    
      - Inline in the tree: This is recommended only for small arrays.  In
        this case, the entire ``ndarray`` tag may be a nested list, in
        which case the type of the array is inferred from the content.
        (See the rules for type inference in the ``inline-data``
        definition below.)  The inline data may also be given in the
        ``data`` property, in which case it is possible to explicitly
        specify the ``datatype`` and other properties.
    
      - External to the tree: The data comes from a [block](ref:block)
        within the same ASDF file or an external ASDF file referenced by a
        URI.
    
    examples:
      -
        - An inline array, with implicit data type
        - |
            !core/ndarray-1.0.0
              [[1, 0, 0],
               [0, 1, 0],
               [0, 0, 1]]
    
      -
        - An inline array, with an explicit data type
        - |
            !core/ndarray-1.0.0
              datatype: float64
              data:
                [[1, 0, 0],
                 [0, 1, 0],
                 [0, 0, 1]]
    
      -
        - An inline structured array, where the types of each column are
          automatically detected
        - |
            !core/ndarray-1.0.0
              [[M110, 110, 205, And],
               [ M31,  31, 224, And],
               [ M32,  32, 221, And],
               [M103, 103, 581, Cas]]
    
      -
        - An inline structured array, where the types of each column are
          explicitly specified
        - |
            !core/ndarray-1.0.0
              datatype: [['ascii', 4], uint16, uint16, ['ascii', 4]]
              data:
                [[M110, 110, 205, And],
                 [ M31,  31, 224, And],
                 [ M32,  32, 221, And],
                 [M103, 103, 581, Cas]]
    
      -
        - A double-precision array, in contiguous memory in a block within
          the same file
        - |
            !core/ndarray-1.0.0
              source: 0
              shape: [1024, 1024]
              datatype: float64
              byteorder: little
    
      -
        - A view of a tile in that image
        - |
            !core/ndarray-1.0.0
              source: 0
              shape: [256, 256]
              datatype: float64
              byteorder: little
              strides: [8192, 8]
              offset: 2099200
    
      -
        - A structured datatype, with nested columns for a coordinate in
          (*ra*, *dec*), and a 3x3 convolution kernel
        - |
            !core/ndarray-1.0.0
              source: 0
              shape: [64]
              datatype:
                - name: coordinate
                  datatype:
                    - name: ra
                      datatype: float64
                    - name: dec
                      datatype: float64
                - name: kernel
                  datatype: float32
                  shape: [3, 3]
              byteorder: little
    
      -
        - An array in Fortran order
        - |
            !core/ndarray-1.0.0
              source: 0
              shape: [1024, 1024]
              datatype: float64
              byteorder: little
              strides: [8192, 8]
    
      -
        - An array where values of -999 are treated as missing
        - |
            !core/ndarray-1.0.0
              source: 0
              shape: [256, 256]
              datatype: float64
              byteorder: little
              mask: -999
    
      -
        - An array where another array is used as a mask
        - |
            !core/ndarray-1.0.0
              source: 0
              shape: [256, 256]
              datatype: float64
              byteorder: little
              mask: !core/ndarray-1.0.0
                source: 1
                shape: [256, 256]
                datatype: bool8
                byteorder: little
    
      -
        - An array where the data is stored in the first block in
          another ASDF file.
        - |
            !core/ndarray-1.0.0
              source: external.asdf
              shape: [256, 256]
              datatype: float64
              byteorder: little
    
    definitions:
      scalar-datatype:
        description: |
          Describes the type of a single element.
    
          There is a set of numeric types, each with a single identifier:
    
          - `int8`, `int16`, `int32`, `int64`: Signed integer types, with
            the given bit size.
    
          - `uint8`, `uint16`, `uint32`, `uint64`: Unsigned integer types,
            with the given bit size.
    
          - `float32`: Single-precision floating-point type or "binary32",
            as defined in IEEE 754.
    
          - `float64`: Double-precision floating-point type or "binary64",
            as defined in IEEE 754.
    
          - `complex64`: Complex number where the real and imaginary parts
            are each single-precision floating-point ("binary32") numbers,
            as defined in IEEE 754.
    
          - `complex128`: Complex number where the real and imaginary
            parts are each double-precision floating-point ("binary64")
            numbers, as defined in IEEE 754.
    
          There are two distinct fixed-length string types, which must
          be indicated with a 2-element array where the first element is an
          identifier for the string type, and the second is a length:
    
          - `ascii`: A string containing ASCII text (all codepoints <
            128), where each character is 1 byte.
    
          - `ucs4`: A string containing unicode text in the UCS-4
            encoding, where each character is always 4 bytes long.  Here
            the number of bytes used is 4 times the given length.
    
        anyOf:
          - type: string
            enum: [int8, uint8, int16, uint16, int32, uint32, int64, uint64,
                   float32, float64, complex64, complex128, bool8]
          - type: array
            items:
              - type: string
                enum: [ascii, ucs4]
              - type: integer
                minimum: 0
            minLength: 2
            maxLength: 2
    
      datatype:
        description: |
          The data format of the array elements.  May be a single scalar
          datatype, or may be a nested list of datatypes.  When a list, each field
          may have a name.
        anyOf:
          - $ref: "#/definitions/scalar-datatype"
          - type: array
            items:
              anyOf:
                - $ref: "#/definitions/scalar-datatype"
                - type: object
                  properties:
                    name:
                      type: string
                      pattern: "[A-Za-z_][A-Za-z0-9_]*"
                      description: The name of the field
                    datatype:
                      $ref: "#/definitions/datatype"
                    byteorder:
                      type: string
                      enum: [big, little]
                      description: |
                        The byteorder for the field.  If not provided, the
                        byteorder of the datatype as a whole will be used.
                    shape:
                      type: array
                      items:
                        type: integer
                        minimum: 0
                  required: [datatype]
    
      inline-data:
        description: |
          Inline data is stored in YAML format directly in the tree, rather than
          referencing a binary block.  It is made out of nested lists.
    
          If the datatype of the array is not specified, it is inferred from
          the array contents.  Type inference is supported only for
          homogeneous arrays, not tables.
    
          - If any of the elements in the array are YAML strings, the
            `datatype` of the entire array is `ucs4`, with the width of
            the largest string in the column, otherwise...
    
          - If any of the elements in the array are complex numbers, the
            `datatype` of the entire column is `complex128`, otherwise...
    
          - If any of the types in the column are numbers with a decimal
            point, the `datatype` of the entire column is `float64`,
            otherwise..
    
          - If any of the types in the column are integers, the `datatype`
            of the entire column is `int64`, otherwise...
    
          - The `datatype` of the entire column is `bool8`.
    
          Masked values may be included in the array using `null`.  If an
          explicit mask array is also provided, it takes precedence.
    
        type: array
        items:
          anyOf:
            - type: number
            - type: string
            - type: "null"
            - $ref: "complex-1.0.0"
            - $ref: "#/definitions/inline-data"
            - type: boolean
    
    anyOf:
      - $ref: "#/definitions/inline-data"
      - type: object
        properties:
          source:
            description: |
              The source of the data.
    
              - If an integer: If positive, the zero-based index of the
                block within the same file. If negative, the index from
                the last block within the same file.  For example, a
                source of `-1` corresponds to the last block in the same
                file.
    
              - If a string, a URI to an external ASDF file containing the
                block data.  Relative URIs and ``file:`` and ``http:``
                protocols must be supported.  Other protocols may be supported
                by specific library implementations.
    
              The ability to reference block data in an external ASDF file
              is intentionally limited to the first block in the external
              ASDF file, and is intended only to support the needs of
              [exploded](ref:exploded).  For the more general case of
              referencing data in an external ASDF file, use tree
              [references](ref:references).
    
            anyOf:
              - type: integer
              - type: string
                format: uri
    
          data:
            description: |
              The data for the array inline.
    
              If `datatype` and/or `shape` are also provided, they must
              match the data here and can be used as a consistency check.
              `strides`, `offset` and `byteorder` are meaningless when
              `data` is provided.
            $ref: "#/definitions/inline-data"
    
          shape:
            description: |
              The shape of the array.
    
              The first entry may be the string `*`, indicating that the
              length of the first index of the array will be automatically
              determined from the size of the block.  This is used for
              streaming support.
            type: array
            items:
              anyOf:
                - type: integer
                  minimum: 0
                - enum: ['*']
    
          datatype:
            description: |
              The data format of the array elements.
            $ref: "#/definitions/datatype"
    
          byteorder:
            description: >
              The byte order (big- or little-endian) of the array data.
            type: string
            enum: [big, little]
    
          offset:
            description: >
              The offset, in bytes, within the data for this start of this
              view.
            type: integer
            minimum: 0
            default: 0
    
          strides:
            description: >
              The number of bytes to skip in each dimension.  If not provided,
              the array is assumed by be contiguous and in C order.  If
              provided, must be the same length as the shape property.
            type: array
            items:
              anyOf:
                - type: integer
                  minimum: 1
                - type: integer
                  maximum: -1
    
          mask:
            description: >
              Describes how missing values in the array are stored.  If a
              scalar number, that number is used to represent missing values.
              If an ndarray, the given array provides a mask, where non-zero
              values represent missing values in this array.  The mask array
              must be broadcastable to the dimensions of this array.
            anyOf:
              - type: number
              - $ref: "complex-1.0.0"
              - allOf:
                - $ref: "ndarray-1.0.0"
                - datatype: bool8
    
        dependencies:
          source: [shape, datatype, byteorder]
    
        propertyOrder: [source, data, mask, datatype, byteorder, shape, offset, strides]
    ...