DataUnit Reference¶
DataUnit¶
Warning
The Python representation of DataUnits is likely to change in the new preflight design. Specifically instances of DataUnit
are likely to represent Registry database tables as opposed to rows in such tables.
Each row in DataUnit
tables will be represented as a dict
in Python.
The non-Python-representation specific part of this section should still hold true.
A DataUnit is a discrete abstract unit of data that can be associated with metadata and used to label Datasets.
Visit, tract, and filter are examples of types of DataUnits; individual visits, tracts, and filters would thus be DataUnit instances.
Each DataUnit type has both a concrete Python class (inheriting from the abstract DataUnit
base class) and a SQL table in the common Registry schema.
A DataUnit type may depend on another. In SQL, this is expressed as a foreign key field in the table for the dependent DataUnit that points to the primary key field of its table for the DataUnit it depends on.
Some DataUnits represent joins between other DataUnits. A join DataUnit depends on the two DataUnits it connects, but is also included automatically in any sequence, container, or machine-generated SQL query in which its dependencies are both present.
Every DataUnit type that is not a join has a “value”. This is a POD (usually a string or integer, but sometimes a tuple of these) that is both its default human-readable representation and a “semi-unique” identifier for the DataUnit: when combined with the “values” of the DataUnits it depends on, the full set of DataUnits is uniquely identified.
DataUnit tables in SQL typically have compound primary keys that include the primary keys of the DataUnits they depend on. These primary keys are also meaningful in Python; they can be accessed as tuples via the DataUnit.pkey
attribute and are frequently used in dictionaries containing DataUnits.
Warning
DataUnitTypeSet
and DataUnitMap
are likely to go away in the new preflight design.
The DataUnitTypeSet
class provides methods that enforce and utilize these rules, providing a centralized implementation to which all other objects that operate on groups of DataUnits can delegate.
The DataUnitMap
class provides Python access to the more complex relationships between DataUnits, including many-to-many joins.
Transition¶
The string keys of data ID dictionaries passed to the v14 Butler are similar to DataUnit type names, and the values of data ID dictionaries are similar to DataUnit values.
A dictionary that maps DataUnit type names to DataUnit values is thus very similar to a v14 data ID dictionary, but most layers of the new design instead use a tuple of DataUnits, a strongly-typed analog that provides a bit more functionality and access to structured metadata.
Python API¶
-
class
DataUnit
¶ An abstract base class whose subclasses represent concrete DataUnits.
-
pkey
¶ Read-only pure-virtual instance attribute (must be implemented by subclasses).
A tuple of POD values that uniquely identify the DataUnit, corresponding to the values in the SQL primary key.
Primary keys in Python are always tuples, even when only a single value is needed to identify the DataUnit type.
-
value
¶ Read-only pure-virtual instance attribute (must be implemented by subclasses).
An integer or string that identifies the DataUnit when combined with any “foreign key” connections to other DataUnits. For example, a Visit’s number is its value, because it uniquely labels a Visit as long as its Camera (its only foreign key DataUnit) is also specified.
-
-
class
DataUnitTypeSet
¶ Warning
DataUnitTypeSet
is likely to go away in the new preflight design.An ordered tuple of unique DataUnit subclasses.
Unlike a regular Python tuple or set, a DataUnitTypeSet’s elements are sorted (the actual sort order is TBD, but it is deterministic). In addition, the inclusion of certain DataUnit types can automatically lead to to the inclusion of others. This can happen because one DataUnit depends on another (most depend on either Camera or SkyMap, for instance), or because a DataUnit (such as ObservedSensor) represents a join between others (such as Visit and PhysicalSensor). For example, if any of the following combinations of DataUnit types are used to initialize a DataUnitTypeSet, its elements will be
[Camera, ObservedSensor, PhysicalSensor, Visit]
:[Visit, PhysicalSensor]
[ObservedSensor]
[Visit, ObservedSensor, Camera]
[Visit, PhysicalSensor, ObservedSensor]
-
__init__
(elements)¶ Initialize the DataUnitTypeSet with a reordered and augmented version of the given DataUnit types as described above.
-
__iter__
()¶ Iterate over the DataUnit types in the set.
-
__len__
()¶ Return the number of DataUnit types in the set.
-
__eq__
(other)¶ Compare two DataUnitTypeSets for equality.
Also supports comparisons with other sequences by converting them to DataUnitTypeSets.
-
__ne__
(other)¶ Compare two DataUnitTypeSets for inequality.
Also supports comparisons with other sequences by converting them to DataUnitTypeSets.
-
__contains__
(k)¶ Return True if the DataUnitTypeSet contains either the given DataUnit type or DataUnit type name.
-
__getitem__
(name)¶ Return the DataUnit type with the given name.
-
pack
(values)¶ Compute an
bytes
string that uniquely identifies the given combination of DataUnit values.Parameters: values (dict) – A dictionary that maps DataUnit type names to either the “values” of those units or actual DataUnit instances. Returns: a bytes
object that labels the given combination of units.This method must be used to populate the
unit_pack
field in the Dataset table.Note
We are currently using a
sha512
hash instead of bit-packing (with the hash generated by theinvariantHash
method), but will likely be packing in the final design.
-
expand
(findfunc, values)¶ Construct a dictionary of DataUnit instances from a dictionary of DataUnit “values”.
Parameters: findfunc – a callable with the same signature and behavior Registry.findDataUnit()
orDataUnitMap.findDataUnit()
.This can (and generally should) be used by concrete Registries to implement
Registry.expand()
.
-
class
DataUnitMap
¶ Warning
DataUnitMap
is likely to go away in the new preflight design.An object that holds a collection of related DataUnits.
-
types
¶ A
DataUnitTypeSet
containing exactly the DataUnit types present in the map.
-
extract
(types)¶ Iterate over tuples of DataUnit instances.
Parameters: types (DataUnitTypeSet) – the DataUnit types to iterate over. Must be a subset of self.types
.Returns: a sequence of tuples of DataUnits whose types correspond to the types
argument (in the same order).
-
group
(types)¶ Group the DataUnitMap according to a subset of its DataUnit types.
Parameters: types (DataUnitTypeSet) – the DataUnit types to group by. Must be a subset of self.types
.Returns: a sequence of tuples of (units, submap)
, wheretypes
is a tuple of DataUnits whose types correspond to thetypes
argument (in the same order), andsubmap
is a DataUnitMap containing only the DataUnits and DatasetRefs related to the ones inunits
. The types insubmap
are the same as those inself
.For example, the following code performs a nested iteration over the Tracts and Patches in a DataUnitMap
assert map.types == (SkyMap, Tract, Patch) for (skymap, tract), submap in map.group((SkyMap, Tract)): assert submap.types == (SkyMap, Tract, Patch) for patch in submap.extract(Patch): ...
-
findDataUnit
(cls, pkey)¶ Return a DataUnit given the values of its primary key.
Parameters: - cls (type) – a class that inherits from
DataUnit
. - pkey (tuple) – a tuple of primary key values that uniquely identify the DataUnit; see
DataUnit.pkey
.
Returns: a
DataUnit
instance of typecls
, orNone
if no matching unit is found.See also
Registry.findDataUnit()
.- cls (type) – a class that inherits from
-
AbstractFilter¶
AbstractFilters are used to label Datasets that aggregate data from multiple Visits (and possibly multiple Cameras.
Having two different DataUnits for filters is necessary to make it possible to combine data from Visits taken with different PhysicalFilters.
- Value:
- abstract_filter_name
- Dependencies:
- None
- Primary Key:
- abstract_filter_name
- Many-to-Many Joins:
- None
Camera¶
Camera DataUnits are essentially just sources of raw data with a constant layout of PhysicalSensors and a self-constent numbering system for Visits.
Different versions of the same camera (due to e.g. changes in hardware) should still correspond to a single Camera DataUnit.
There are thus multiple afw.cameraGeom.Camera
objects associated with a single Camera DataUnit; the most natural approach to relating them would be to store the afw.cameraGeom.Camera
as a VisitRange Dataset.
The Collimated Beam Projector (CBP) state and the CBP spectrograph will be represented by distinct Camera DataUnits, allowing changes in state and spectrograph observations to be represented as Visits with those Cameras. These are associated with main-camera Visits that represent observations of the CBP by the VisitSelfJoin table.
Like SkyMap but unlike every other DataUnit, Cameras are represented by a polymorphic class hierarchy in Python rather than a single concrete class.
- Value:
- camera_name
- Dependencies:
- None
- Primary Key:
- camera_name
- Many-to-Many Joins:
- None
Transition¶
Camera subclasses take over many of the roles played by obs_
package Mapper
subclasses in the v14 Butler (with StorageHint creation an important and intentional exception).
Python API¶
-
class
Camera
¶ An abstract base class whose subclasses are generally singletons.
-
instances
¶ Concrete class attribute: provided by the base class.
A dictionary holding all
Camera
instances, keyed by theirname
attributes. Subclasses are responsible for adding an instance to this dictionary at module-import time.
-
name
¶ Virtual instance attribute: must be implemented by base classes.
A string name for the Camera that can be used as its primary key in SQL.
-
makePhysicalSensors
()¶ Return the full list of
PhysicalSensor
instances associated with the Camera.This virtual method will be called by a Registry when it adds a new Camera to populate its PhysicalSensors table.
-
makePhysicalFilters
()¶ Return the full list of
PhysicalFilter
instances associated with the Camera.This virtual method will be called by a Registry when it adds a new Camera to populate its PhysicalFilters table.
-
PhysicalFilter¶
PhysicalFilters represent the bandpass filters that can be associated with a Visit.
A PhysicalFilter may or may not be associated with a particular AbstractFilter.
- Value:
- physical_filter_name
- Dependencies:
- (camera_name) -> Camera (camera_name)
- (abstract_filter_name) -> AbstractFilter (abstract_filter_name) [optional]
- Primary Key:
- camera_name, physical_filter_name
- Many-to-Many Joins:
- None
Python API¶
PhysicalSensor¶
PhysicalSensors represent a sensor in a Camera, independent of any observations.
Because some cameras identify sensors with string names and other use numbers, we provide fields for both; the name may be a stringified integer, and the number may be autoincrement. Only the number is used as part of the primary key.
The group
field may mean different things for different Cameras (such as rafts for LSST, or groups of sensors oriented the same way relative to the focal plane for HSC).
The purpose
field indicates the role of the sensor (such as science, wavefront, or guiding).
Valid choices should be standardized across Cameras, but are currently TBD.
- Value:
- physical_sensor_number
- Dependencies:
- (camera_name) -> Camera (camera_name)
- Primary Key:
- (physical_sensor_number, camera_name)
- Many-to-Many Joins:
- Visit via ObservedSensor
Python API¶
-
class
PhysicalSensor
¶ -
-
number
¶ A number that identifies the sensor. Only guaranteed to be unique across PhysicalSensors associated with the same Camera.
-
name
¶ The name of the sensor. Only guaranteed to be unique across PhysicalSensors associated with the same Camera.
-
group
¶ A Camera-specific group the sensor belongs to.
-
purpose
¶ A Camera-generic role for the sensor.
-
Visit¶
Visits correspond to observations with the full camera at a particular pointing, possibly comprised of multiple exposures (Snaps).
Some DatasetTypes representing raw exposures may use Visits with no Snaps, while others may use Snaps. It may be useful to define a raw DatasetType with Snap even when only one Snap will exist if the data is to be processed with a pipeline that can operate on multi-:ref:`Snap inputs.
Visits can represent observations taken for calibration purposes, such as flat field images.
A Visit’s region
field holds an approximate but inclusive representation of its position on the sky that can be compared to the regions
of other DataUnits.
- Value:
- visit_number
- Dependencies:
- (camera_name) -> Camera (camera_name)
- (physical_filter_name) -> ref:PhysicalFilter (physical_filter_name)
- Primary Key:
- (visit_number, camera_name, physical_filter_name)
- Many-to-Many Joins:
- PhysicalSensor via ObservedSensor
- Tract via VisitTractJoin
- Patch via VisitPatchJoin
Todo
Visit will need to have many more fields to hold metadata (in general, we want to include anything we might want to query on when selecting Datasets).
We should consider adding everything in afw.image.VisitInfo
.
That may be true of some other concrete DataUnits as well.
It will probably be necessary to add per-Camera Visit tables for Camera-specific metadata as well. This is a rather significant change to the common schema, but it’s not actually problematic.
Python API¶
-
class
Visit
¶ -
-
number
¶ A number that identifies the Visit. Only guaranteed to be unique across Visits associated with the same Camera.
-
filter
¶ The
PhysicalFilter
the Visit was observed with.
-
obsBegin
¶ The date and time of the beginning of the Visit.
-
exposureTime
¶ The total exposure time of the Visit (in seconds).
-
region
¶ An object (type TBD) that describes the spatial extent of the Visit on the sky.
-
sensors
¶ A sequence of
ObservedSensor
instances associated with this Visit.
-
ObservedSensor¶
An ObservedSensor is a join between a Visit and a PhysicalSensor.
Unlike most other DataUnit join tables (which are not typically DataUnits themselves), this one is both ubuiquitous and contains additional information: a region
that represents the position of the observed sensor image on the sky.
We may also add additional observational metadata in the future.
- Value:
- None
- Dependencies:
- (camera_name) -> Camera (camera_name)
- (visit_number, camera_name) -> Visit (visit_number, camera_name)
- (physical_sensor_number, camera_name) -> PhysicalSensor (physical_sensor_number, camera_name)
- Primary Key:
- (visit_number, physical_sensor_number, camera_name)
- Many-to-Many Joins:
- VisitRange via VisitRangeJoin
- Tract via SensorTractJoin
- Patch via SensorPatchJoin
Python API¶
-
class
ObservedSensor
¶ -
-
physical
¶ The
PhysicalFilter
instance associated with the ObservedSensor.
-
region
¶ An object (type TBD) that describes the spatial extent of the ObservedSensor on the sky.
-
Snap¶
A Snap is a single-exposure subset of a Visit.
Most non-LSST Visits will have only a single Snap.
- Value:
- snap_index
- Dependencies:
- Primary Key:
- (snap_index, visit_number, camera_name)
- Many-to-Many Joins:
- None
Python API¶
VisitRange¶
VisitRanges are DataUnits that label master calibration products, and are defined as a range of Visits from a given Camera.
The VisitRange associated with not-yet-observed Visits may be indicated by setting visit_end
to -1
(we can’t use NULL
for visit_end
because it is part of the compound primary key). This is mapped to None
in Python.
- Value:
- visit_begin, visit_end
- Dependencies:
- Primary Key:
- (visit_begin, visit_end, camera_name)
- Many-to-Many Joins:
- Visit via VisitRangeJoin
Python API¶
SkyMap¶
Each SkyMap entry represents a different way to subdivide the sky into tracts and patches, including any parameters involved in those definitions.
SkyMaps in Python are part of a polymorphic hierarchy, but unlike Cameras, their instances are not singletons, so we can’t just store them in a global dictionary in the software stack. Instead, we serialize SkyMap instances directly into the Registry as blobs.
- Value:
- skymap_name
- Dependencies:
- None
- Primary Key:
- skymap_name
- Many-to-Many Joins:
- None
Transition¶
Ultimately this SkyMap hierarchy should entirely replace those in the v14 lsst.skymap
package, and we’ll store the SkyMap information directly in the Registry database rather than a separate pickle file.
There’s no need for two parallel class hierarchies to represent the same concepts.
Python API¶
-
class
SkyMap
¶ -
name
¶ A unique, human-readable name for the SkyMap that can be used as its primary key in SQL.
-
makeTracts
()¶ Return the full list of
Tract
instances associated with the Skymap.This virtual method will be called by a Registry when it adds a new SkyMap to populate its Tract and Patch tables.
-
serialize
()¶ Write the SkyMap to a blob.
-
classmethod
deserialize
(name, blob)¶ Reconstruct a SkyMap instance from a blob.
Todo
- Add other methods from
lsst.skymap.BaseSkyMap
, including iteration over Tracts. That may suggest removingmakeTracts()
if it becomes redundant, or adding arguments todeserialize()
to provide Tracts and Patches from their tables instead of the blob. - What is the connection between
serialize()
,deserialize()
and__reduce__
? Can we just use pickle?
-
Tract¶
A Tract is a contiguous, simple area on the sky with a 2-d Euclidian coordinate system related to spherical coordinates by a single map projection.
Todo
If the parameters of the sky projection and/or the Tract’s various bounding boxes can be standardized across all SkyMap implementations, it may be useful to include them in the table as well.
- Value:
- tract_number
- Dependencies:
- (skymap_name) -> SkyMap (skymap_name)
- Primary Key:
- (tract_number, skymap_name)
- Many-to-Many Joins:
Transition¶
Should eventually fully replace v14’s lsst.skymap.TractInfo
.
Python API¶
Patch¶
Tracts are subdivided into Patches, which share the Tract coordinate system and define similarly-sized regions that overlap by a configurable amount.
Todo
As with Tracts, we may want to include fields to describe Patch boundaries in this table in the future.
- Value:
- patch_index
- Dependencies:
- Primary Key:
- (patch_index, tract_number, skymap_name)
- Many-to-Many Joins:
Transition¶
Should eventually fully replace v14’s lsst.skymap.PatchInfo
.