Skip to content

Conversation

@dsnet
Copy link
Collaborator

@dsnet dsnet commented Jan 3, 2025

When operating under the Deterministic option,
sort names according to UTF-8 instead of UTF-16.

The practical difference is negligible,
but the JSONv2 discussion revealed that many
objected to sorted according to UTF-16,
which is what RFC 8785 specifies.

Given that Deterministic does not say we comply with RFC 8785, we can choose any arbitrary ordering.

When operating under the Deterministic option,
sort names according to UTF-8 instead of UTF-16.

The practical difference is negligible,
but the JSONv2 discussion revealed that many
objected to sorted according to UTF-16,
which is what RFC 8785 specifies.

Given that Deterministic does not say we comply with RFC 8785,
we can choose any arbitrary ordering.
@dsnet dsnet requested review from johanbrandhorst and mvdan January 3, 2025 20:36
@dsnet dsnet changed the title Sort by UTF-8 instead of UTF-16 Deterministically sort by UTF-8 instead of UTF-16 Jan 3, 2025
Copy link
Collaborator

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always provide an alternative deterministic option/API in the future that's strictly aligned with RFC 8785 if we wish to.

@dsnet dsnet merged commit 9eeb134 into master Jan 3, 2025
8 checks passed
@dsnet dsnet deleted the deterministic-utf8 branch January 3, 2025 23:13
gopherbot pushed a commit to golang/go that referenced this pull request Oct 24, 2025
This is semantically identical and just a cleanup.

Prior to #63397, JSON object names were sorted according to UTF-16
to match the semantic of RFC 8785, but there were a number of
objections in the discussion to using that as the sorting order.

In go-json-experiment/json#121,
we switched to sorting by UTF-8, which matches the behavior
of v1 and avoids an option to toggle the behavior.
However, we should have deleted the stringSlice.Sort method
and just directly called slices.Sort.

From a principled perspective, both UTF-16 and UTF-8 are
reasonable ways to sort JSON object names.
RFC 8259 specifies that the entire JSON text is encoded as UTF-8.
However, the way JSON strings are encoded requires escaping
Unicode codepoints according to UTF-16 surragate halves
(a quirk of JavaScript inherited by JSON).
Thus, JSON is inconsistently both UTF-8 and UTF-16.

Change-Id: Id92b5cc20efe4201827e9d3fccf24ccf894d3e60
Reviewed-on: https://go-review.googlesource.com/c/go/+/713522
Reviewed-by: Johan Brandhorst-Satzkorn <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
TryBot-Bypass: Damien Neil <[email protected]>
Auto-Submit: Damien Neil <[email protected]>
Reviewed-by: Daniel Martí <[email protected]>
Reviewed-by: Damien Neil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants