-
Notifications
You must be signed in to change notification settings - Fork 35
Chunkwise image loader #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Chunkwise image loader #279
Conversation
… image-reader-chunkwise
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #279 +/- ##
===========================================
+ Coverage 39.16% 50.41% +11.25%
===========================================
Files 26 27 +1
Lines 2663 2759 +96
===========================================
+ Hits 1043 1391 +348
+ Misses 1620 1368 -252
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! I have 2 minor suggestions. I also saw that you use the width
by height
convention. Personally, I don't have a strong opinion here, though we could also stick to array api conventions. @LucaMarconato WDYT? Pre-approving for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry had to change due to rethinking memmap
. This does not always work, for example when dealing with compressed tiffs as far as I am aware.
Co-authored-by: Wouter-Michiel Vierdag <w-mv@hotmail.com>
Description
This PR addresses the challenge that the currently implemented and planned image loaders require loading imaging data entirely into memory, typically as NumPy arrays. Given the large size of microscopy datasets, this is not always feasible.
To mitigate this issue, and as discussed with @LucaMarconato, this PR aims to introduce a generalizable approach for reading large microscopy files in chunks, enabling efficient handling of data that does not fit into memory.
Some related discussions.
Strategy
In this PR, we focus on
.tiff
images, as implemented in the_tiff_to_chunks
function.tifffile.memmap
)_compute_chunks
)dask.array
which is memory-mapped and avoids memory overflow (_read_chunks
)dask.array
(viadask.array.block
)The strategy is implemented in
src/spatialdata_io/readers/generic.py
andsrc/spatialdata_io/readers/_utils/_image.py
Future extensions
The strategy can be implemented for any image type, as long as it is possible to implement
We have implemented similar readers for openslide-compatible whole slide images and the Carl-Zeiss microscopy format.