Butler

A high level object that provides read access to the Datasets in a single Collection and write access to a single Run.

Butler is a concrete, final Python class in the current design; all extensibility is provided by the Registry and Datastore instances it holds.

digraph Butler {
node[shape=record]
edge[dir=back, arrowtail=empty]

Butler [label="{Butler|+ config\n + datastore\n+ registry|+ get()\n + put()}"];

Butler -> ButlerConfiguration [arrowtail=odiamond];
Butler -> Datastore [arrowtail=odiamond];
Butler -> Registry [arrowtail=odiamond];
}

Transition

The new Butler plays essentially the same role as the v14 Butler.

Python API

class Butler
config

a ButlerConfiguration instance

run

a Run instance, that contains the collection to use for output.

datastore

a Datastore instance

registry

a Registry instance

get(label, parameters=None)

Load a Dataset or a slice thereof from the Butler’s Collection.

Parameters:
Returns:

an InMemoryDataset.

Implemented as:

handle = self.registry.find(self.run.collection, label)
return self.getDirect(handle, parameters)

Todo

  • Implementation requires all components to be able to handle (typically pass-through) parameters passed for the composite. Could we instead get away with only passing those when getting the parent from the Datastore?
  • Recursive composites were broken by a minor update. Would probably not be hard to add back in if we decide we need them, but they’d make the logic a bit harder to follow so not worth doing now.
getDirect(handle, parameters=None)

Load a Dataset or a slice thereof from a DatasetHandle.

Unless Butler.get(), this method allows Datasets outside the Butler’s Collection to be read as long as the DatasetHandle that identifies them can be obtained separately. This is needed to support the Comparison SuperTasks use case.

Parameters:
Returns:

an InMemoryDataset.

Implemented as:

parent = self.datastore.get(handle.uri, handle.type.storageClass, parameters) if handle.uri else None
children = {name: self.datastore.get(childHandle, parameters) for name, childHandle in handle.components.items()}
return handle.type.storageClass.assemble(parent, children)
put(label, dataset, producer=None)

Write a Dataset.

Parameters:
Returns:

a DatasetHandle

Implemented as:

ref = self.registry.expand(label)
run = self.run
assert(producer is None or run == producer.run)
storageHint = ref.makeStorageHint(run)
uri, components = self.datastore.put(inMemoryDataset, ref.type.storageClass, storageHint, ref.type.name)
return self.registry.addDataset(ref, uri, components, producer=producer, run=run)
markInputUsed(quantum, ref)

Mark a Dataset as having been “actually” (not just predicted-to-be) used by a Quantum.

Parameters:

Implemented as:

handle = self.registry.find(self.run.collection, ref)
self.registry.markInputUsed(handle, quantum)

Remove the Datasets associated with the given DatasetLabels from the Butler’s Collection, and signal that they may be deleted from storage if they are not referenced by any other Collection.

Implemented as:

handles = [self.registry.find(self.run.collection, label)
           for label in labels]
for handle in self.registry.disassociate(self.run.collection, handles, remove=True):
    self.datastore.remove(handle.uri)

Todo

How much more of Registry’s should Butler forward?

class ButlerConfiguration

Note

This currently is a class that maps directly onto a YAML file.

  • Configuration options are accessed through dictionary keys separated by dots (e.g. config['datastore.root']).
  • Configuration for Datastore and Registry, including which classes to instantiate, is nested under config['datastore'] and config['registry'] respectively.

But this is an implementation detail that is likely to change significantly.