DataUnit Reference

DataUnit

Warning

The Python representation of DataUnits is likely to change in the new preflight design. Specifically instances of DataUnit are likely to represent Registry database tables as opposed to rows in such tables.

Each row in DataUnit tables will be represented as a dict in Python.

The non-Python-representation specific part of this section should still hold true.

A DataUnit is a discrete abstract unit of data that can be associated with metadata and used to label Datasets.

Visit, tract, and filter are examples of types of DataUnits; individual visits, tracts, and filters would thus be DataUnit instances.

Each DataUnit type has both a concrete Python class (inheriting from the abstract DataUnit base class) and a SQL table in the common Registry schema.

A DataUnit type may depend on another. In SQL, this is expressed as a foreign key field in the table for the dependent DataUnit that points to the primary key field of its table for the DataUnit it depends on.

Some DataUnits represent joins between other DataUnits. A join DataUnit depends on the two DataUnits it connects, but is also included automatically in any sequence, container, or machine-generated SQL query in which its dependencies are both present.

Every DataUnit type that is not a join has a “value”. This is a POD (usually a string or integer, but sometimes a tuple of these) that is both its default human-readable representation and a “semi-unique” identifier for the DataUnit: when combined with the “values” of the DataUnits it depends on, the full set of DataUnits is uniquely identified.

DataUnit tables in SQL typically have compound primary keys that include the primary keys of the DataUnits they depend on. These primary keys are also meaningful in Python; they can be accessed as tuples via the DataUnit.pkey attribute and are frequently used in dictionaries containing DataUnits.

Warning

DataUnitTypeSet and DataUnitMap are likely to go away in the new preflight design.

The DataUnitTypeSet class provides methods that enforce and utilize these rules, providing a centralized implementation to which all other objects that operate on groups of DataUnits can delegate.

The DataUnitMap class provides Python access to the more complex relationships between DataUnits, including many-to-many joins.

Transition

The string keys of data ID dictionaries passed to the v14 Butler are similar to DataUnit type names, and the values of data ID dictionaries are similar to DataUnit values.

A dictionary that maps DataUnit type names to DataUnit values is thus very similar to a v14 data ID dictionary, but most layers of the new design instead use a tuple of DataUnits, a strongly-typed analog that provides a bit more functionality and access to structured metadata.

Python API

class DataUnit

An abstract base class whose subclasses represent concrete DataUnits.

pkey

Read-only pure-virtual instance attribute (must be implemented by subclasses).

A tuple of POD values that uniquely identify the DataUnit, corresponding to the values in the SQL primary key.

Primary keys in Python are always tuples, even when only a single value is needed to identify the DataUnit type.

value

Read-only pure-virtual instance attribute (must be implemented by subclasses).

An integer or string that identifies the DataUnit when combined with any “foreign key” connections to other DataUnits. For example, a Visit’s number is its value, because it uniquely labels a Visit as long as its Camera (its only foreign key DataUnit) is also specified.

class DataUnitTypeSet

Warning

DataUnitTypeSet is likely to go away in the new preflight design.

An ordered tuple of unique DataUnit subclasses.

Unlike a regular Python tuple or set, a DataUnitTypeSet’s elements are sorted (the actual sort order is TBD, but it is deterministic). In addition, the inclusion of certain DataUnit types can automatically lead to to the inclusion of others. This can happen because one DataUnit depends on another (most depend on either Camera or SkyMap, for instance), or because a DataUnit (such as ObservedSensor) represents a join between others (such as Visit and PhysicalSensor). For example, if any of the following combinations of DataUnit types are used to initialize a DataUnitTypeSet, its elements will be [Camera, ObservedSensor, PhysicalSensor, Visit]:

  • [Visit, PhysicalSensor]
  • [ObservedSensor]
  • [Visit, ObservedSensor, Camera]
  • [Visit, PhysicalSensor, ObservedSensor]
__init__(elements)

Initialize the DataUnitTypeSet with a reordered and augmented version of the given DataUnit types as described above.

__iter__()

Iterate over the DataUnit types in the set.

__len__()

Return the number of DataUnit types in the set.

__eq__(other)

Compare two DataUnitTypeSets for equality.

Also supports comparisons with other sequences by converting them to DataUnitTypeSets.

__ne__(other)

Compare two DataUnitTypeSets for inequality.

Also supports comparisons with other sequences by converting them to DataUnitTypeSets.

__contains__(k)

Return True if the DataUnitTypeSet contains either the given DataUnit type or DataUnit type name.

__getitem__(name)

Return the DataUnit type with the given name.

pack(values)

Compute an bytes string that uniquely identifies the given combination of DataUnit values.

Parameters:values (dict) – A dictionary that maps DataUnit type names to either the “values” of those units or actual DataUnit instances.
Returns:a bytes object that labels the given combination of units.

This method must be used to populate the unit_pack field in the Dataset table.

Note

We are currently using a sha512 hash instead of bit-packing (with the hash generated by the invariantHash method), but will likely be packing in the final design.

expand(findfunc, values)

Construct a dictionary of DataUnit instances from a dictionary of DataUnit “values”.

Parameters:findfunc – a callable with the same signature and behavior Registry.findDataUnit() or DataUnitMap.findDataUnit().

This can (and generally should) be used by concrete Registries to implement Registry.expand().

class DataUnitMap

Warning

DataUnitMap is likely to go away in the new preflight design.

An object that holds a collection of related DataUnits.

types

A DataUnitTypeSet containing exactly the DataUnit types present in the map.

extract(types)

Iterate over tuples of DataUnit instances.

Parameters:types (DataUnitTypeSet) – the DataUnit types to iterate over. Must be a subset of self.types.
Returns:a sequence of tuples of DataUnits whose types correspond to the types argument (in the same order).
group(types)

Group the DataUnitMap according to a subset of its DataUnit types.

Parameters:types (DataUnitTypeSet) – the DataUnit types to group by. Must be a subset of self.types.
Returns:a sequence of tuples of (units, submap), where types is a tuple of DataUnits whose types correspond to the types argument (in the same order), and submap is a DataUnitMap containing only the DataUnits and DatasetRefs related to the ones in units. The types in submap are the same as those in self.

For example, the following code performs a nested iteration over the Tracts and Patches in a DataUnitMap

assert map.types == (SkyMap, Tract, Patch)

for (skymap, tract), submap in map.group((SkyMap, Tract)):
    assert submap.types == (SkyMap, Tract, Patch)
    for patch in submap.extract(Patch):
        ...
findDataUnit(cls, pkey)

Return a DataUnit given the values of its primary key.

Parameters:
Returns:

a DataUnit instance of type cls, or None if no matching unit is found.

See also Registry.findDataUnit().

SQL Representation

There is one table for each DataUnit type, and a DataUnit instance is a row in one of those tables. Being abstract, there is no single table associated with DataUnits in general.

AbstractFilter

AbstractFilters are used to label Datasets that aggregate data from multiple Visits (and possibly multiple Cameras.

Having two different DataUnits for filters is necessary to make it possible to combine data from Visits taken with different PhysicalFilters.

Value:
abstract_filter_name
Dependencies:
None
Primary Key:
abstract_filter_name
Many-to-Many Joins:
None

Python API

class AbstractFilter
name

The name of the filter.

SQL Representation

AbstractFilter

AbstractFilter
abstract_filter_name varchar NOT NULL

Camera

Camera DataUnits are essentially just sources of raw data with a constant layout of PhysicalSensors and a self-constent numbering system for Visits.

Different versions of the same camera (due to e.g. changes in hardware) should still correspond to a single Camera DataUnit. There are thus multiple afw.cameraGeom.Camera objects associated with a single Camera DataUnit; the most natural approach to relating them would be to store the afw.cameraGeom.Camera as a VisitRange Dataset.

The Collimated Beam Projector (CBP) state and the CBP spectrograph will be represented by distinct Camera DataUnits, allowing changes in state and spectrograph observations to be represented as Visits with those Cameras. These are associated with main-camera Visits that represent observations of the CBP by the VisitSelfJoin table.

Like SkyMap but unlike every other DataUnit, Cameras are represented by a polymorphic class hierarchy in Python rather than a single concrete class.

Value:
camera_name
Dependencies:
None
Primary Key:
camera_name
Many-to-Many Joins:
None

Transition

Camera subclasses take over many of the roles played by obs_ package Mapper subclasses in the v14 Butler (with StorageHint creation an important and intentional exception).

Python API

class Camera

An abstract base class whose subclasses are generally singletons.

instances

Concrete class attribute: provided by the base class.

A dictionary holding all Camera instances, keyed by their name attributes. Subclasses are responsible for adding an instance to this dictionary at module-import time.

name

Virtual instance attribute: must be implemented by base classes.

A string name for the Camera that can be used as its primary key in SQL.

makePhysicalSensors()

Return the full list of PhysicalSensor instances associated with the Camera.

This virtual method will be called by a Registry when it adds a new Camera to populate its PhysicalSensors table.

makePhysicalFilters()

Return the full list of PhysicalFilter instances associated with the Camera.

This virtual method will be called by a Registry when it adds a new Camera to populate its PhysicalFilters table.

SQL Representation

Camera

Camera
camera_name varchar NOT NULL
module varchar NOT NULL

module is a string containing a fully-qualified Python module that can be imported to ensure that Camera.instances[name] returns a Camera instance.

PhysicalFilter

PhysicalFilters represent the bandpass filters that can be associated with a Visit.

A PhysicalFilter may or may not be associated with a particular AbstractFilter.

Value:
physical_filter_name
Dependencies:
  • (camera_name) -> Camera (camera_name)
  • (abstract_filter_name) -> AbstractFilter (abstract_filter_name) [optional]
Primary Key:
camera_name, physical_filter_name
Many-to-Many Joins:
None

Python API

class PhysicalFilter
camera

The Camera instance associated with the filter.

name

The name of the filter. Only guaranteed to be unique across PhysicalFilters associated with the same Camera.

abstract

The associated AbstractFilter, or None.

SQL Representation

PhysicalFilter

PhysicalFilter
physical_filter_name varchar NOT NULL
camera_name varchar NOT NULL
abstract_filter_name varchar  

PhysicalSensor

PhysicalSensors represent a sensor in a Camera, independent of any observations.

Because some cameras identify sensors with string names and other use numbers, we provide fields for both; the name may be a stringified integer, and the number may be autoincrement. Only the number is used as part of the primary key.

The group field may mean different things for different Cameras (such as rafts for LSST, or groups of sensors oriented the same way relative to the focal plane for HSC).

The purpose field indicates the role of the sensor (such as science, wavefront, or guiding). Valid choices should be standardized across Cameras, but are currently TBD.

Value:
physical_sensor_number
Dependencies:
  • (camera_name) -> Camera (camera_name)
Primary Key:
(physical_sensor_number, camera_name)
Many-to-Many Joins:

Python API

class PhysicalSensor
camera

The Camera instance associated with the filter.

number

A number that identifies the sensor. Only guaranteed to be unique across PhysicalSensors associated with the same Camera.

name

The name of the sensor. Only guaranteed to be unique across PhysicalSensors associated with the same Camera.

group

A Camera-specific group the sensor belongs to.

purpose

A Camera-generic role for the sensor.

SQL Representation

PhysicalSensor

PhysicalSensor  
physical_sensor_number varchar NOT NULL
name varchar  
camera_name varchar NOT NULL
group varchar  
purpose varchar  

Visit

Visits correspond to observations with the full camera at a particular pointing, possibly comprised of multiple exposures (Snaps).

Some DatasetTypes representing raw exposures may use Visits with no Snaps, while others may use Snaps. It may be useful to define a raw DatasetType with Snap even when only one Snap will exist if the data is to be processed with a pipeline that can operate on multi-:ref:`Snap inputs.

Visits can represent observations taken for calibration purposes, such as flat field images.

A Visit’s region field holds an approximate but inclusive representation of its position on the sky that can be compared to the regions of other DataUnits.

Value:
visit_number
Dependencies:
  • (camera_name) -> Camera (camera_name)
  • (physical_filter_name) -> ref:PhysicalFilter (physical_filter_name)
Primary Key:
(visit_number, camera_name, physical_filter_name)
Many-to-Many Joins:

Todo

Visit will need to have many more fields to hold metadata (in general, we want to include anything we might want to query on when selecting Datasets). We should consider adding everything in afw.image.VisitInfo. That may be true of some other concrete DataUnits as well.

It will probably be necessary to add per-Camera Visit tables for Camera-specific metadata as well. This is a rather significant change to the common schema, but it’s not actually problematic.

Python API

class Visit
camera

The Camera instance associated with the Visit.

number

A number that identifies the Visit. Only guaranteed to be unique across Visits associated with the same Camera.

filter

The PhysicalFilter the Visit was observed with.

obsBegin

The date and time of the beginning of the Visit.

exposureTime

The total exposure time of the Visit (in seconds).

region

An object (type TBD) that describes the spatial extent of the Visit on the sky.

sensors

A sequence of ObservedSensor instances associated with this Visit.

SQL Representation

Visit

Visit  
visit_number int NOT NULL
camera_name varchar NOT NULL
physical_filter_name varchar NOT NULL
obs_begin datetime  
exposure_time float  
region blob  

ObservedSensor

An ObservedSensor is a join between a Visit and a PhysicalSensor.

Unlike most other DataUnit join tables (which are not typically DataUnits themselves), this one is both ubuiquitous and contains additional information: a region that represents the position of the observed sensor image on the sky. We may also add additional observational metadata in the future.

Value:
None
Dependencies:
  • (camera_name) -> Camera (camera_name)
  • (visit_number, camera_name) -> Visit (visit_number, camera_name)
  • (physical_sensor_number, camera_name) -> PhysicalSensor (physical_sensor_number, camera_name)
Primary Key:
(visit_number, physical_sensor_number, camera_name)
Many-to-Many Joins:

Python API

class ObservedSensor
camera

The Camera instance associated with the ObservedSensor.

visit

The Visit instance associated with the ObservedSensor.

physical

The PhysicalFilter instance associated with the ObservedSensor.

region

An object (type TBD) that describes the spatial extent of the ObservedSensor on the sky.

SQL Representation

ObservedSensor

ObservedSensor
visit_number int NOT NULL
physical_sensor_number int NOT NULL
camera_name varchar NOT NULL
region blob  

Snap

A Snap is a single-exposure subset of a Visit.

Most non-LSST Visits will have only a single Snap.

Value:
snap_index
Dependencies:
  • (camera_name) -> Camera (camera_name)
  • (visit_number, camera_name) -> Visit (visit_number, camera_name)
Primary Key:
(snap_index, visit_number, camera_name)
Many-to-Many Joins:
None

Python API

class Snap
camera

The Camera instance associated with the Snap.

visit

The Visit instance the Snap is a part of.

index

The index of the Snap within its Visit.

obsBegin

The date and time of the beginning of the Snap.

exposureTime

The exposure time of the Snap.

SQL Representation

Snap

Snap
visit_number int NOT NULL
snap_index int NOT NULL
camera_name varchar NOT NULL
obs_begin datetime NOT NULL
obs_end datetime NOT NULL

VisitRange

VisitRanges are DataUnits that label master calibration products, and are defined as a range of Visits from a given Camera.

The VisitRange associated with not-yet-observed Visits may be indicated by setting visit_end to -1 (we can’t use NULL for visit_end because it is part of the compound primary key). This is mapped to None in Python.

Value:
visit_begin, visit_end
Dependencies:
  • (camera_name) -> Camera (camera_name)
  • (visit_begin) -> Visit (visit_number)
  • (visit_end) -> Visit (visit_number)
Primary Key:
(visit_begin, visit_end, camera_name)
Many-to-Many Joins:

Python API

class VisitRange
camera

The Camera instance associated with the VisitRange.

visitBegin

The number of the first Visit instance associated with the ObservedSensor.

visitEnd

The number of the last Visit instance associated with the ObservedSensor, or -1 for an open range.

SQL Representation

VisitRange

VisitRange
visit_begin int NOT NULL
visit_end int NOT NULL
camera_name varchar NOT NULL

SkyMap

Each SkyMap entry represents a different way to subdivide the sky into tracts and patches, including any parameters involved in those definitions.

SkyMaps in Python are part of a polymorphic hierarchy, but unlike Cameras, their instances are not singletons, so we can’t just store them in a global dictionary in the software stack. Instead, we serialize SkyMap instances directly into the Registry as blobs.

Value:
skymap_name
Dependencies:
None
Primary Key:
skymap_name
Many-to-Many Joins:
None

Transition

Ultimately this SkyMap hierarchy should entirely replace those in the v14 lsst.skymap package, and we’ll store the SkyMap information directly in the Registry database rather than a separate pickle file. There’s no need for two parallel class hierarchies to represent the same concepts.

Python API

class SkyMap
name

A unique, human-readable name for the SkyMap that can be used as its primary key in SQL.

makeTracts()

Return the full list of Tract instances associated with the Skymap.

This virtual method will be called by a Registry when it adds a new SkyMap to populate its Tract and Patch tables.

serialize()

Write the SkyMap to a blob.

classmethod deserialize(name, blob)

Reconstruct a SkyMap instance from a blob.

Todo

  • Add other methods from lsst.skymap.BaseSkyMap, including iteration over Tracts. That may suggest removing makeTracts() if it becomes redundant, or adding arguments to deserialize() to provide Tracts and Patches from their tables instead of the blob.
  • What is the connection between serialize(), deserialize() and __reduce__? Can we just use pickle?

SQL Representation

SkyMap

SkyMap
skymap_name varchar NOT NULL
module varchar NOT NULL
serialized blob NOT NULL

Tract

A Tract is a contiguous, simple area on the sky with a 2-d Euclidian coordinate system related to spherical coordinates by a single map projection.

Todo

If the parameters of the sky projection and/or the Tract’s various bounding boxes can be standardized across all SkyMap implementations, it may be useful to include them in the table as well.

Value:
tract_number
Dependencies:
  • (skymap_name) -> SkyMap (skymap_name)
Primary Key:
(tract_number, skymap_name)
Many-to-Many Joins:

Transition

Should eventually fully replace v14’s lsst.skymap.TractInfo.

Python API

class Tract
skymap

The associated SkyMap instance.

number

An integer that identifies this Tract within its SkyMap.

region

An object (type TBD) that represents the Tract’s extent on the sky.

Todo

Add other methods from lsst.skymap.TractInfo.

SQL Representation

Tract

Tract
tract_number int NOT NULL
skymap_name varchar NOT NULL
region blob  

Patch

Tracts are subdivided into Patches, which share the Tract coordinate system and define similarly-sized regions that overlap by a configurable amount.

Todo

As with Tracts, we may want to include fields to describe Patch boundaries in this table in the future.

Value:
patch_index
Dependencies:
  • (skymap_name) -> SkyMap (skymap_name)
  • (tract_number, skymap_name) -> Tract (tract_number, skymap_name)
Primary Key:
(patch_index, tract_number, skymap_name)
Many-to-Many Joins:

Transition

Should eventually fully replace v14’s lsst.skymap.PatchInfo.

Python API

class Tract
skymap

The associated SkyMap instance.

tract

The associated Tract instance.

index

An integer that identifies this Patch within its Tract.

cellX

The column location of the cell represented by this tract in the grid represented by its Tarct.

region

An object (type TBD) that represents the Patch’s extent on the sky.

Todo

Add other methods from lsst.skymap.PatchInfo.

SQL Representation

Patch

Patch
patch_index int NOT NULL
tract_number int NOT NULL
cell_x int NOT NULL
cell_y int NOT NULL
skymap_name varchar NOT NULL
region blob