lamindb.Transform

class lamindb.Transform(name: str, key: str | None = None, type: TransformType | None = None, revises: Transform | None = None)

Bases: DBRecord, IsVersioned

Data transformations such as scripts, notebooks, functions, or pipelines.

A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has inputs and outputs.

A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.

Transforms are versioned so that a given transform version maps on a given source code version.

Can I sync transforms to git?

If you switch on sync_git_repo a script-like transform is synched to its hashed state in a git repository upon calling ln.track().

>>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb"
>>> ln.track()

The definition of transforms and runs is consistent the OpenLineage specification where a Transform record would be called a “job” and a Run record a “run”.

Parameters:
  • namestr A name or title.

  • keystr | None = None A short name or path-like semantic key.

  • typeTransformType | None = "pipeline" See TransformType.

  • revisesTransform | None = None An old version of the transform.

See also

track()

Globally track a script, notebook or pipeline run.

Run

Executions of transforms.

Notes

Examples

Create a transform for a pipeline:

>>> transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()

Create a transform from a notebook:

>>> ln.track()

View predecessors of a transform:

>>> transform.view_lineage()

Attributes

DoesNotExist = <class 'lamindb.models.transform.Transform.DoesNotExist'>
Meta = <class 'lamindb.models.dbrecord.DBRecord.Meta'>
MultipleObjectsReturned = <class 'lamindb.models.transform.Transform.MultipleObjectsReturned'>
created_by: User

Creator of record.

created_by_id
property latest_run: Run

The latest run of this transform.

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

property name: str

Name of the transform.

Splits key on / and returns the last element.

objects = <lamindb.models.query_manager.QueryManager object>
property pk
predecessors: Transform

Preceding transforms.

These are auto-populated whenever an artifact or collection serves as a run input, e.g., artifact.run and artifact.transform get populated & saved.

The table provides a more convenient method to query for the predecessors that bypasses querying the Run.

It also allows to manually add predecessors whose outputs are not tracked in a run.

projects: Project

Linked projects.

references: Reference

Linked references.

runs: Run

Runs of this transform.

space: Space

The space in which the record lives.

space_id
property stem_uid: str

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid
successors: Transform

Subsequent transforms.

See predecessors.

ulabels: ULabel

ULabel annotations of this transform.

property versions: QuerySet

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()

Methods

async adelete(using=None, keep_parents=False)
async arefresh_from_db(using=None, fields=None, from_queryset=None)
async asave(*args, force_insert=False, force_update=False, using=None, update_fields=None)
clean()

Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.

clean_fields(exclude=None)

Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.

date_error_message(lookup_type, field_name, unique_for)
delete()

Delete.

Return type:

None

get_constraints()
get_deferred_fields()

Return a set containing names of deferred fields for this instance.

prepare_database_save(field)
refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

DBRecord

save_base(raw=False, force_insert=False, force_update=False, using=None, update_fields=None)

Handle the parts of saving which should be done only once per save, yet need to be done in raw saves, too. This includes some sanity checks and signal sending.

The ‘raw’ argument is telling save_base not to save any parent models and not to do any changes to the values before save. This is used by fixture loading.

serializable_value(field_name)

Return the value of the field name for this instance. If the field is a foreign key, return the id value instead of the object. If there’s no Field object with this name on the model, return the model attribute’s value.

Used to serialize a field’s value (in the serializer, or form output, for example). Normally, you would just access the attribute directly and not use this method.

unique_error_message(model_class, unique_check)
validate_constraints(exclude=None)
validate_unique(exclude=None)

Check unique constraints on the model and raise ValidationError if any failed.

view_lineage(with_successors=False, distance=5)

View lineage of transforms.