Pattern: Hybrid Pipelines¶

The problem¶

Real pipelines rarely live in one place.

Some stages are best expressed as Python functions. Some stages are better as config. Some need a little pre-filled configuration. Some need to be swappable.

The common result is a split brain:

part of the workflow in Python
part of it in config
one more wrapper function to connect the two

Dracon lets you keep the pipeline itself in YAML while still using ordinary Python functions for the stages.

The pattern¶

Use !pipe to compose stages. A stage can be:

a Python callable from context
a !fn template
a !fn:path partial
another pipe

!define vit_pipeline: !pipe
  - load_data
  - validate: { minimum: 2 }
  - train_vit

report: ${vit_pipeline(source='s3://raw')}

With Python functions like:

def load_data(source):
    return {"records": [1, 2, 3, 4], "source": source}


def validate(records, source, minimum=0):
    return {"records": [x for x in records if x >= minimum], "source": source}


def train_vit(records, source):
    return {"model": "vit", "count": len(records), "source": source}

the pipeline stays fully declared in YAML, while the heavy lifting still happens in normal Python.

Why this works well¶

!pipe threads outputs into later stages automatically:

mapping outputs are unpacked into keyword arguments
non-mapping outputs go into the next stage's remaining required input

That means the pipeline wiring lives in config instead of in hand-written orchestration code.

Stage families¶

Once a pipeline is just another callable value, you can keep several of them in the same config:

!define pipelines:
  resnet: !pipe
    - load_data
    - validate: { minimum: 2 }
    - train_resnet

  vit: !pipe
    - load_data
    - validate: { minimum: 2 }
    - train_vit

!set_default pipeline_kind: vit

chosen: ${pipelines[pipeline_kind](source='s3://raw')}

Now the config is choosing between whole workflow shapes, not just scalar values.

Mixing YAML and Python stages¶

You do not have to choose one style.

A pipeline can combine:

plain Python functions from the loader context
!fn templates for lightweight YAML-side transforms
!fn:path partials for configured Python callables

That is why "hybrid pipeline" is a better description than just "function composition". The point is the mix.

Good use cases¶

ETL and data validation chains
preprocessing plus model training
evaluation workflows
report-generation pipelines
small orchestration layers around ordinary Python code

A useful boundary¶

Keep stage logic in Python when it is real code.

Keep pipeline shape in YAML when what varies is:

ordering
pre-filled stage parameters
which backend or stage family to use

That split tends to stay readable.