-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Memory Robustness code refactoring to increase capacity and reduce memory footprint #5449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
To trigger regression tests:
|
|
Is there a way to verify the following statement in the unit test?
|
| offset to be used when assigning edge global ids in the current partition | ||
| return_orig_ids : bool, optional | ||
| Indicates whether to return original node/edge IDs. | ||
| schema : json dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you follow the DGL convention, e.g.
dgl/python/dgl/data/graph_serialize.py
Line 100 in f5ddb11
| labels: dict[str, Tensor] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and other types, such as numpy array -> numpy.array
| schema : json dictionary | ||
| dictionary object created by reading the metadata.json file for the | ||
| current dataset | ||
| part_id : integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
| return_orig_ids : bool, optional | ||
| Indicates whether to return original node/edge IDs. | ||
| schema : json dictionary | ||
| dictionary object created by reading the metadata.json file for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize the first char.
Same for others.
| numpy array : | ||
| shuffle_global_nids, assigned after the data-shuffling phase, for the | ||
| nodes in the current partition | ||
| tuple : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuple of what? follow the convention: tuple[blablabla]
| gc.collect() | ||
| logging.info( | ||
| f"There are {len(shuffle_global_src_id)} edges in partition {part_id}" | ||
| f"[Rank: {part_id}] There are {len(shuffle_global_src_id)} edges in partition {part_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too long, over 80 chars.
| shuffle_global_dst_id = None | ||
| global_src_id = None | ||
| global_dst_id = None | ||
| # global_edge_id = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
| return_orig_eids=False, | ||
| ): | ||
| """ | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary blank line.
| Returns: | ||
| -------- | ||
| dgl object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix the types above and below according to the comment.
| dictionary | ||
| map between edge type(string) and edge_type_id(int) | ||
| dict of tensors | ||
| If `return_orig_nids=True`, return a dict of 1D tensors whose key is the node type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
over 80 chars.
| dstids = dstids - nids[0] + offset | ||
|
|
||
| # return the values | ||
| return uniques, idxes, srcids, dstids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Optional] Since you changed almost 80% of the file, I'd suggest you to also update the other piece to follow the convention.
| ) | ||
| memory_snapshot("ShuffleGlobalID_Lookup_Complete: ", rank) | ||
|
|
||
| def prepare_local_data(src_data, local_part_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test for this? make sure the src_data are properly poped?
| REV_DATA_TYPE_ID, | ||
| ) | ||
|
|
||
| DATA_TYPE_ID = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference better this DATA_TYPE_ID and utils.DATA_TYPE_ID, why do you need to create a new one?
... as part of improving the unit test coverage of the data pre processing pipeline.
Description
With benchmarking felarge/webgraph/searchctr graphs, it is observed that the memory bottleneck is because of dgl graph creation, in the convert_partition.py module. Currently we can only partition felarge graph no less than 12 partitions and webgraph no less than 32 partitions.
To overcome this issue, so as to increase the size of the graph partition per node following code changes are made using this PR:
No additional (unit) test cases are instrumented to this PR because all the existing end-to-end test cases in the unit test framework will be sufficient to test this functional refactoring of the code.
Checklist
Please feel free to remove inapplicable items for your PR.
Changes