Skip to content

Conversation

@quentinblampey
Copy link
Contributor

Closes #134

I updated the dtype behavior for dask to fix #134.

I also added support for DiskArray in mean_var - I think we just needed to always np.power instead of the ** notation. Except if you had a specific reason to use ** @flying-sheep?
I think it's very inefficient though, since it will move the result of the power operation directly in memory (at least, this is what I understand, but it may be wrong). We would like to have it in memory only after the mean reduction, but maybe there is no other way to do that - I'm not familiar enough with h5.Datasets.

I wanted to add some tests but I don't understand all the details of the tests, is there any instructions or CONTRIBUTING.md file I could use to run and update the tests?

@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.15%. Comparing base (729d35f) to head (c7325e3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
+ Coverage   96.76%   99.15%   +2.38%     
==========================================
  Files          19       19              
  Lines         464      473       +9     
==========================================
+ Hits          449      469      +20     
+ Misses         15        4      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Oct 29, 2025

CodSpeed Performance Report

Merging #135 will not alter performance

Comparing quentinblampey:mean_var_h5 (c7325e3) with main (729d35f)

Summary

✅ 232 untouched

@flying-sheep flying-sheep changed the title Support DiskArray on mean_var and support Dask in min/max Support Dask in min/max Oct 30, 2025
@flying-sheep
Copy link
Member

flying-sheep commented Oct 30, 2025

OK, as the failing benchmarks show, the ** had a reason.

Also you’re right: just creating a copy of the full-size Dataset isn’t a good idea, we’d have to do something smarter.

Therefore let’s just keep this PR single-purpose for min-max Dask support, and do the disk stuff in a separate PR. I reverted the disk stuff in here and fixed mypy

image

@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Oct 30, 2025
@flying-sheep
Copy link
Member

flying-sheep commented Oct 30, 2025

OK, looks like there are more issues with min/max, I’m sorry that I didn’t catch that before making the release.

@quentinblampey
Copy link
Contributor Author

quentinblampey commented Oct 30, 2025

OK, as the failing benchmarks show, the ** had a reason.
Also you’re right: just creating a copy of the full-size Dataset isn’t a good idea, we’d have to do something smarter.

Yes, it seems so...
No problem, we can handle that in a future PR! I'll think more about it later

@quentinblampey
Copy link
Contributor Author

Hi @flying-sheep, is there something I can do to help fix the CI? Do you know what's going wrong?

@flying-sheep
Copy link
Member

flying-sheep commented Nov 4, 2025

We didn’t test min/max, and merged it in a broken state. It doesn’t seem to support sparse data.
Looks pretty clear from the CI logs:

  • min(some_sparse_array_or_matrix, axis=0|1) returns a coo_array, but the tests expect a dense array.
  • min(some_cupy_sparse_matrix, axis=0|1) and min(..., axis=None, keep_cupy_as_array=False) tries to call .squeeze() on a cupy matrix which doesn’t exist
  • … more stuff, with sparse-in-dask and so on

I’m working on a scanpy release, so if you want to leave this to me, you’ll need to wait a little 😉

@quentinblampey
Copy link
Contributor Author

quentinblampey commented Nov 4, 2025

Okay no problem, I may try to fix it

Good luck with the Scanpy release!

@flying-sheep flying-sheep changed the title Support Dask in min/max fix: support Dask and (cupy/scipy) sparse matrices in min/max Nov 18, 2025
@flying-sheep flying-sheep changed the title fix: support Dask and (cupy/scipy) sparse matrices in min/max fix: support Dask and cupy/scipy sparse matrices in min/max Nov 18, 2025
@flying-sheep flying-sheep merged commit da8e85a into scverse:main Nov 18, 2025
21 checks passed
@flying-sheep
Copy link
Member

Here we go!

@quentinblampey
Copy link
Contributor Author

Very nice, thanks @flying-sheep!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-gpu-ci Apply this label to run GPU CI once

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dask support for min/max

2 participants