Pattern: Hybrid Pipelines¶
The problem¶
Real pipelines rarely live in one place.
Some stages are best expressed as Python functions. Some stages are better as config. Some need a little pre-filled configuration. Some need to be swappable.
The common result is a split brain:
- part of the workflow in Python
- part of it in config
- one more wrapper function to connect the two
Dracon lets you keep the pipeline itself in YAML while still using ordinary Python functions for the stages.
The pattern¶
Use !pipe to compose stages. A stage can be:
- a Python callable from context
- a
!fntemplate - a
!fn:pathpartial - another pipe
!define vit_pipeline: !pipe
- load_data
- validate: { minimum: 2 }
- train_vit
report: ${vit_pipeline(source='s3://raw')}
With Python functions like:
def load_data(source):
return {"records": [1, 2, 3, 4], "source": source}
def validate(records, source, minimum=0):
return {"records": [x for x in records if x >= minimum], "source": source}
def train_vit(records, source):
return {"model": "vit", "count": len(records), "source": source}
the pipeline stays fully declared in YAML, while the heavy lifting still happens in normal Python.
Why this works well¶
!pipe threads outputs into later stages automatically:
- mapping outputs are unpacked into keyword arguments
- non-mapping outputs go into the next stage's remaining required input
That means the pipeline wiring lives in config instead of in hand-written orchestration code.
Stage families¶
Once a pipeline is just another callable value, you can keep several of them in the same config:
!define pipelines:
resnet: !pipe
- load_data
- validate: { minimum: 2 }
- train_resnet
vit: !pipe
- load_data
- validate: { minimum: 2 }
- train_vit
!set_default pipeline_kind: vit
chosen: ${pipelines[pipeline_kind](source='s3://raw')}
Now the config is choosing between whole workflow shapes, not just scalar values.
Mixing YAML and Python stages¶
You do not have to choose one style.
A pipeline can combine:
- plain Python functions from the loader context
!fntemplates for lightweight YAML-side transforms!fn:pathpartials for configured Python callables
That is why "hybrid pipeline" is a better description than just "function composition". The point is the mix.
Good use cases¶
- ETL and data validation chains
- preprocessing plus model training
- evaluation workflows
- report-generation pipelines
- small orchestration layers around ordinary Python code
A useful boundary¶
Keep stage logic in Python when it is real code.
Keep pipeline shape in YAML when what varies is:
- ordering
- pre-filled stage parameters
- which backend or stage family to use
That split tends to stay readable.