ASDF uses JSON Schema to perform validation of ASDF files. Schema validation of ASDF files serves the following purposes:
Ensures conformity with core data types defined by the ASDF Standard. ASDF readers can detect whether an ASDF file has been modified in a way that would render it unreadable or unrecognizable.
Enables interoperability between ASDF implementations. Implementations that recognize the same schema definitions should be able to interpret files containing instances of data types that conform to those schemas.
Allows for the definition of custom data types. External software packages can provide ASDF schemas that correspond to types provided by that package, and then serialize instances of those types in a way that is standardized and portable.
All ASDF implementations must implement the types defined by the core schemas and validate against them when reading files. 1 The ASDF Standard also defines two other categories of schemas, which are optional for ASDF implementations:
The ASDF Standard also defines two metaschemas which are used to validate the ASDF schemas themselves:
More information on the schemas defined by ASDF can be found in ASDF Standard Schema Definitions.
ASDF schemas are encoded in YAML and conform to a superset of JSON Schema called YAML Schema. The version of YAML supported by ASDF is 1.1. Accordingly, all schemas begin with the following YAML header:
%YAML 1.1 ---
The following top-level attributes are required for all ASDF schemas: 2
$schema: Indicates the metaschema definition used to validate this schema
id: A name that uniquely identifies the schema
tag: The YAML tag corresponding to the type described by this schema
Each of these attributes is described in more detail below.
ASDF schemas use the top-level
$schema attribute to declare the metaschema
that is used to validate the schema itself. Most custom ASDF schemas will
conform to YAML Schema defined by the ASDF Standard, and
so will have the following top-level attribute:
Some ASDF schemas use the ASDF metaschema instead (e.g. ndarray). It is also possible to create custom metaschemas, although these should always inherit from either YAML Schema or the ASDF metaschema. 3
Some ASDF implementations may choose to validate the schemas themselves (e.g.
as part of a regression testing suite). The
$schema keyword should be used
to determine the metaschema to be used for validation. All schemas should also
validate successfully against YAML Schema.
id represents the globally unique name of the schema. It must be a
valid URI and cannot be an empty
string or an empty fragment (e.g.
#). See Naming Conventions for
conventions to ensure global uniqueness.
id must be a valid URI, it does not have to describe a real
location on disk or on a network. For example, the
id values for all
schemas in the ASDF Standard begin with the prefix
http://stsci.edu/schemas/asdf/. However, as of this writing, none of the
schemas are actually hosted at that location.
id keyword is used for reference resolution both within a schema and
between schemas. Relative references within a schema are resolved against the
id of that schema. A reference to an external schema uses the
that schema. See References below for additional information.
Each ASDF implementation must define how to resolve a schema
id to a real
resource that contains the schema itself. This could be done in a variety of
ways, but the following seem like the most likely possibilities:
idto a real network location (assuming the schema is actually hosted at that location)
idto a file location on disk that contains the schema
Other mappings are possible in theory. For example, a schema could be stored in a string literal as part of a program.
tag attribute is used by ASDF to associate an instance of a data type
in an ASDF file with the appropriate schema to be used for validation. It is a
concept from YAML (see the documentation).
Libraries that provide custom schemas must ensure that the YAML tag that is
written for a particular data type must match the
tag attribute in the
schema that corresponds to the data type. Tags must conform to the tag URI
scheme which is defined in RFC 4151, but are otherwise perfectly arbitrary.
However, certain Naming Conventions are recommended in order to facilitate a
ASDF implementations must be able to map
tag attributes to the
id. The way that this mapping is defined is up to
individual implementations. However, if the Naming Conventions are followed,
most implementations will be able to perform prefix matching and replacement.
id attribute will almost certainly become required in a future
version of the ASDF Standard, the
tag attribute may remain optional. This
is because schemas can be referenced by
id without necessarily referring to
a particular tagged type in the YAML representation.
Each schema may optionally contain descriptive fields:
examples. These fields may contain core markdown
syntax (which will be used for the purposes of rendering schema documentation
by, for example, sphinx-asdf).
title: A one-line summary of the data type described by the schema
description: A lengthier prose description of the schema
examples: A list of example content that conforms to the schema, illustrating how to use it.
A particular ASDF schemas can contain references to other ASDF schemas.
References are encoded by using the
$ref attribute anywhere in the tree.
While JSON Schema references are purely based on
implementations must be able to resolve references using both
The resolution of
tag references to actual schema files is up to
individual implementations. It is recommended for ASDF implementations to
use a two-phase mapping: one from
id, and another from
an actual schema resource. In most cases, the
id will be resolved to a
location on disk (e.g. to a schema file that is installed in a known location).
However, other scenarios might involve schemas that are hosted on a network, or
schemas that are embedded in source files as string literals.
id attributes must be valid URIs. Schema
tag attributes must be
valid URIs that conform to the tag URI scheme defined in RFC 4151 Aside from
these requirements, assignment of these attributes is perfectly arbitrary.
However, certain conventions are strongly recommended in order to ensure
uniqueness and to enable a simple correspondence between the
attributes. These conventions are described below.
All schema ids should encode the following information:
organization: Indicates the organization that created the schema
standard: The “standard” this schema belongs to. This will usually correspond to the name of the software package that provides this schema.
name: The name of the data type corresponding to this schema.
version: The version of the schema. See Versioning Conventions for more details.
Consider the schemas from the ASDF Standard as an example. In this case, the
stsci.edu, which is the web address of the organization
that created the schemas. The standard is
asdf. Each individual schema
in the ASDF Standard has a different name field. In the case of the
ndarray data type, for example, the name is
core/ndarray. The version of ndarray is
Some other types in the ASDF Standard have multiple versions, such as
quantity-1.0.0 and quantity-1.1.0.
While schema ids can be any valid URI, under this convention they always begin
http://. The general format of the id attribute becomes:
Continuing with the example of ndarray, we have:
The idea behind the convention for
id is that it should be possible (in
principle if not in practice) for schemas to be hosted at the corresponding
URL. This motivates the choice of the organization’s web address as the
organization component. However, this is not a requirement. The primary
objective is to create a globally unique id.
Given the components defined above, the
tag definition follows in a
straightforward manner. The generic tag URI template is:
Considering ndarray once again, we have:
Following the naming convention for both
tag attributes enables
a simple mapping from
id. In this case, simply take the prefix
tag:stsci.edu: and replace it with
Designing a new tag and schema
This section will walk through the development of a new tag and schema. In the
example, suppose we work at the Example Organization, which can be
found on the world wide web at
example.org. We’re developing a new
foo, and we need a way to define the specialized metadata to
describe the exposures that it will be generating.
According to the Naming Conventions, our
id attributes will
consist of the following components:
1.0.0(by convention the starting version for all new schemas)
So, for our example instrument metadata, the tag is:
Each tag should be associated with a schema in order to validate it. Each
schema must also have a universally unique
id, which is in the form of
Note that this URI doesn’t actually have to resolve to anything. In fact,
visiting that URL in your web browser is likely to bring up a
All that’s necessary is that it is universally unique and that the tool reading
the ASDF file is able to map from a tag name to a schema URI, and then load the
Again following with our example, we will assign the following URI to refer to our schema:
Therefore, in our schema file, we have the following keys, one declaring the
name of the YAML
tag, and one defining the
id of the schema:
id: "http://example.org/schemas/foo/metadata-1.0.0" tag: "tag:example.org:foo/metadata-1.0.0"
Since our schema is just a basic ASDF schema, we will declare that it conforms to YAML Schema defined by the ASDF Standard:
Continuing our example, we include some descriptive metadata about the data type declared by the schema itself:
title: | Metadata for the foo instrument. description: | This stores some information about an exposure from the foo instrument. examples: - - A minimal description of an exposure. - | tag:example.org:foo/metadata-1.0.0 exposure_time: 0.001
The schema proper
The rest of the schema describes the acceptable data types and their structure. The format used for this description comes straight out of JSON Schema, and rather than documenting all of the things it can do here, please refer to Understanding JSON Schema, and the further resources available at json-schema.org.
In our example, we’ll define two metadata elements: the name of the investigator, and the exposure time, each of which also have a description:
type: object properties: investigator: type: string description: | The name of the principal investigator who requested the exposure. exposure_time: type: number description: | The time of the exposure, in nanoseconds.
We’ll also define an optional element for the exposure time unit. This is a somewhat contrived example to demonstrate how to include elements in your schema that are based on the custom types defined in the ASDF standard:
exposure_time_units: $ref: "http://stsci.edu/schemas/asdf/unit/unit-1.0.0" description: | The unit of the exposure time. default: s
Lastly, we’ll declare
exposure_time as being required, and allow
extra elements to be added:
required: [exposure_time] additionalProperties: true
The complete example
Here is our complete schema example:
%YAML 1.1 --- $schema: "http://stsci.edu/schemas/yaml-schema/draft-01" id: "http://example.org/schemas/foo/metadata-1.0.0" tag: "tag:example.org:foo/metadata-1.0.0" title: | Metadata for the foo instrument. description: | This stores some information about an exposure from the foo instrument. examples: - - A minimal description of an exposure. - | tag:example.org:foo/metadata-1.0.0 exposure_time: 0.001 type: object properties: investigator: type: string description: | The name of the principal investigator who requested the exposure. exposure_time: type: number description: | The time of the exposure, in nanoseconds. exposure_time_units: $ref: "http://stsci.edu/schemas/asdf/unit/unit-1.0.0" description: | The unit of the exposure time. default: s required: [exposure_time] additionalProperties: true
Extending an existing schema
JSON Schema does not support the concept of inheritance, which makes it somewhat awkward to express type hierarchies. However, it is possible to create a custom schema that adds attributes to an existing schema (e.g. one defined in the ASDF Standard). It is important to remember that it is not possible to override or remove any of the attributes from the existing schema.
The following important caveats apply when extending an existing schema:
It is not possible to redefine, override, or delete any attributes in the original schema.
It will not be possible to add attributes to any node where the original schema declares
Instances of the custom type will not be recognized as an instance of the original type when resolving schema references or processing YAML tags (i.e. there is no concept of polymorphism).
Here’s an example of extending a schema using the software schema defined by the ASDF Standard. Here’s the original schema, for reference:
%YAML 1.1 --- $schema: "http://stsci.edu/schemas/yaml-schema/draft-01" id: "http://stsci.edu/schemas/asdf/core/software-1.0.0" title: | Describes a software package. description: | General-purpose description of a software package. tag: "tag:stsci.edu:asdf/core/software-1.0.0" type: object properties: name: description: | The name of the application or library. type: string author: description: | The author (or institution) that produced the software package. type: string homepage: description: | A URI to the homepage of the software. type: string format: uri version: description: | The version of the software used. It is recommended, but not required, that this follows the (Semantic Versioning Specification)[http://semver.org/spec/v2.0.0.html]. type: string required: [name, version] additionalProperties: true ...
Since the software schema permits additional properties, we are free to extend it to include an email address for contacting the author:
%YAML 1.1 --- $schema: "http://stsci.edu/schemas/yaml-schema/draft-01" id: "http://somewhere.org/schemas/software_extended-1.0.0" title: | Describes a software package. description: | Extension of ASDF core software schema to include the software author's contact email. allOf: - $ref: http://stsci.edu/schemas/asdf/core/software-1.0.0 - properties: author_email: description: | The contact email of the software author. type: string required: [author_email] ...
The crucial portion of this schema definition is the way that the
operator is used to join a reference to the base software schema with the
definition of a new property called
allOf combiner means that any instance that is validated against
software_extended-1.0.0 will have to conform to both the base software schema
and the properties specific to the extended schema.
The JSON Schema spec includes a schema annotation attribute called
can be used to describe the default value of a data attribute when that attribute
is missing. Recent versions of the spec point out
that there is no single correct way to choose an annotation value when multiple
are available due to references and combiners. This presents a problem when
trying to fill in missing data in a file based on the schema
multiple conflicting values are available, the software does not know how to choose.
Previous versions of the ASDF Standard did not offer guidance on how
default. The Python reference implementation read the first default
that it encountered as a literal value and inserted that value into the tree when
the corresponding attribute was otherwise missing. Until version 2.8, it also
removed attributes on write whose values matched their schema defaults. The
resulting files would appear to the casual viewer to be missing data, and may in
fact be invalid against their schemas if the any of the removed attributes were required.
Implementations must not remove attributes with default values from the tree.
Beginning with ASDF Standard 1.6.0, implementations also must not fill default values
directly from the schema. This will avoid ambiguity when multiple schema defaults
are present, and also permit the
default attribute to contain a description
that is not appropriate to use as a literal default value. For example:
default: An array of zeros matching the dimensions of the data array.
For ASDF Standard < 1.6.0, filling default values from the schema is required. This is necessary to support files written by older versions of the Python implementation.
Implementations may expose the control of validation on reading to the user (e.g. to disable it on demand). However, validation on reading should be the default behavior.
The presence of
tagis not currently enforced by the YAML Schema but may be in a future version of the ASDF Standard. Authors of new schemas should assume that at the very least
idwill be required in a future version of the Standard.
For an example of how to inherit from another metaschema, look at the contents of the ASDF metaschema and see how there is a reference to the YAML schema in the top-level