-
Notifications
You must be signed in to change notification settings - Fork 221
Description
DAG Factory version
1.0.1
airflow version
3.1.3
Python version
3.12
Deployment
Docker-Compose
Deployment details
No response
What happened?
According to the Airflow docs, an Asset can contain ASCII alphanumeric characters, plus %, -, _, ., and ~.
Currently, dag-factory doesn't allow using - in dataset conditions.
Looking at the code, I believe the problem lies in utils.extract_dataset_names, where the regex should be \b[a-zA-Z_][a-zA-Z0-9_-]*\b :
Current implementation:
>>> extract_dataset_names("(asset-1 & asset2)")
['asset', 'asset2']
My regex:
>>> extract_dataset_names("(asset-1 & asset2)")
['asset-1', 'asset2']
I believe that we should also modify the utils.make_valid_variable_name to allow - with the regex r"[^\w-]|^(?=\d)":
Current implementation:
>>> make_valid_variable_name("asset-1")
'asset_1'
My regex:
>>> make_valid_variable_name("asset-1")
'asset-1'
@ErickSeo since you implemented the dataset conditions (and thanks for that!), am I missing something? When not using dataset conditions I am able to set a dataset named "asset-1" and Airflow accepts it.
Relevant log output
How to reproduce
Try to create a Dag with a dataset condition where some datasets include -
Anything else :)?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Contact Details
No response