Skip to content

[Bug] Handle all valid dataset names in dataset conditions #629

@ecodina

Description

@ecodina

DAG Factory version

1.0.1

airflow version

3.1.3

Python version

3.12

Deployment

Docker-Compose

Deployment details

No response

What happened?

According to the Airflow docs, an Asset can contain ASCII alphanumeric characters, plus %, -, _, ., and ~.

Currently, dag-factory doesn't allow using - in dataset conditions.

Looking at the code, I believe the problem lies in utils.extract_dataset_names, where the regex should be \b[a-zA-Z_][a-zA-Z0-9_-]*\b :

Current implementation:

>>> extract_dataset_names("(asset-1 & asset2)")
['asset', 'asset2']

My regex:

>>> extract_dataset_names("(asset-1 & asset2)")
['asset-1', 'asset2']

I believe that we should also modify the utils.make_valid_variable_name to allow - with the regex r"[^\w-]|^(?=\d)":

Current implementation:

>>> make_valid_variable_name("asset-1")
'asset_1'

My regex:

>>> make_valid_variable_name("asset-1")
'asset-1'

@ErickSeo since you implemented the dataset conditions (and thanks for that!), am I missing something? When not using dataset conditions I am able to set a dataset named "asset-1" and Airflow accepts it.

Relevant log output

How to reproduce

Try to create a Dag with a dataset condition where some datasets include -

Anything else :)?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Contact Details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions