Butler¶
A high level object that provides read access to the Datasets in a single Collection and write access to a single Run.
Butler is a concrete, final Python class in the current design; all extensibility is provided by the Registry and Datastore instances it holds.
Transition¶
The new Butler plays essentially the same role as the v14 Butler.
Python API¶
-
class
Butler
¶ -
config
¶ a
ButlerConfiguration
instance
-
get
(label, parameters=None)¶ Load a Dataset or a slice thereof from the Butler’s Collection.
Parameters: - label (DatasetLabel) – a
DatasetLabel
that identifies the Dataset to retrieve. - parameters (dict) – a dictionary of StorageClass-specific parameters that can be used to obtain a slice of the Dataset.
Returns: an InMemoryDataset.
Implemented as:
handle = self.registry.find(self.config.collection, label) return self.getDirect(handle, parameters)
Todo
- Implementation requires all components to be able to handle (typically pass-through) parameters passed for the composite. Could we instead get away with only passing those when getting the parent from the Datastore?
- Recursive composites were broken by a minor update. Would probably not be hard to add back in if we decide we need them, but they’d make the logic a bit harder to follow so not worth doing now.
- label (DatasetLabel) – a
-
getDirect
(handle, parameters=None)¶ Load a Dataset or a slice thereof from a
DatasetHandle
.Unless
Butler.get()
, this method allows Datasets outside the Butler’s Collection to be read as long as theDatasetHandle
that identifies them can be obtained separately. This is needed to support the Comparison SuperTasks use case.Parameters: - handle (DatasetHandle) – a pointer to the Dataset to load.
- parameters (dict) – a dictionary of StorageClass-specific parameters that can be used to obtain a slice of the Dataset.
Returns: an InMemoryDataset.
Implemented as:
parent = self.datastore.get(handle.uri, handle.type.storageClass, parameters) if handle.uri else None children = {name : self.datastore.get(childHandle, parameters) for name, childHandle in handle.components.items()} return handle.type.storageClass.assemble(parent, children)
-
put
(label, dataset, producer=None)¶ Write a Dataset.
Parameters: - label (DatasetLabel) – a
DatasetLabel
that will identify the Dataset being stored. - dataset – the InMemoryDataset to store.
- producer (Quantum) – the Quantum instance that produced the Dataset. May be
None
for some Registries.producer.run
must matchself.config.run
.
Returns: Implemented as:
ref = self.registry.expand(label) run = self.config.run assert(producer is None or run == producer.run) template = self.config.templates.get(ref.type.name, None) path = ref.makePath(run, template) uri, components = self.datastore.put(inMemoryDataset, ref.type.storageClass, path, ref.type.name) return self.registry.addDataset(ref, uri, components, producer=producer, run=run)
- label (DatasetLabel) – a
-
markInputUsed
(quantum, ref)¶ Mark a Dataset as having been “actually” (not just predicted-to-be) used by a Quantum.
Parameters: - quantum (Quantum) – the dependent Quantum.
- ref (DatasetRef) – the Dataset that is a true dependency of
quantum
.
Implemented as:
handle = self.registry.find(self.config.collection, ref) self.registry.markInputUsed(handle, quantum)
-
unlink
(*labels)¶ Remove the Datasets associated with the given
DatasetLabels
from the Butler’s Collection, and signal that they may be deleted from storage if they are not referenced by any other Collection.Implemented as:
handles = [self.registry.find(self.config.collection, labels) for label in labels] for handle in self.registry.disassociate(self.config.collection, handles, remove=True): self.datastore.remove(handle.uri)
Todo
How much more of Registry’s should Butler forward?
-
-
class
ButlerConfiguration
¶ -
collection
¶ The CollectionTag of the input collection.
-
run
¶ The Run instance used for all outputs.
May be
None
to construct a read-only Butler.The Run’s Collection is always used as the input collection when a Run is provided.
-
templates
¶ A dict that maps DatasetType names to path templates, used to override
DatasetType.template
as obtained from the Registry when present.
-