Skip to content

Equality via pattern-matching compiles to slow code #3914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jvanburen opened this issue Apr 22, 2025 · 3 comments
Open

Equality via pattern-matching compiles to slow code #3914

jvanburen opened this issue Apr 22, 2025 · 3 comments
Labels
flambda2 Prerequisite for, or part of, flambda2

Comments

@jvanburen
Copy link
Contributor

jvanburen commented Apr 22, 2025

example:

  type t =
    | Naked_immediate
    | Naked_float32
    | Naked_float
    | Naked_int8
    | Naked_int16
    | Naked_int32
    | Naked_int64
    | Naked_nativeint
    | Naked_vec128

  let equal t1 t2 =
    match t1, t2 with
    | Naked_immediate, Naked_immediate -> true
    | Naked_float32, Naked_float32 -> true
    | Naked_float, Naked_float -> true
    | Naked_int8, Naked_int8 -> true
    | Naked_int16, Naked_int16 -> true
    | Naked_int32, Naked_int32 -> true
    | Naked_int64, Naked_int64 -> true
    | Naked_nativeint, Naked_nativeint -> true
    | Naked_vec128, Naked_vec128 -> true
    | ( ( Naked_immediate | Naked_float32 | Naked_float | Naked_int8
        | Naked_int16 | Naked_int32 | Naked_int64 | Naked_nativeint
        | Naked_vec128 ),
        _ ) ->
      false

leads to

camlExample.equal_1:
        .quad   caml_curry2
        .quad   0x280000000000007
        .quad   camlExample.equal_0_1_code
        .quad   0xffc
        .ascii  "unreachable switch case"
        .byte   0
camlExample.equal_0_1_code:
        sar     rax
        lea     rdx, [rip + .L185]
        movsxd  rax, dword ptr [rdx + 4*rax]
        add     rdx, rax
        jmp     rdx
.L117:
        cmp     rbx, 1
        je      .L121
        mov     eax, 1
        ret
.L121:
        mov     eax, 3
        ret
.L124:
        cmp     rbx, 3
        je      .L128
        mov     eax, 1
        ret
.L128:
        mov     eax, 3
        ret
.L131:
        cmp     rbx, 5
        je      .L135
        mov     eax, 1
        ret
.L135:
        mov     eax, 3
        ret
.L138:
        cmp     rbx, 7
        je      .L142
        mov     eax, 1
        ret
.L142:
        mov     eax, 3
        ret
.L145:
        cmp     rbx, 9
        je      .L149
        mov     eax, 1
        ret
.L149:
        mov     eax, 3
        ret
.L152:
        cmp     rbx, 11
        je      .L156
        mov     eax, 1
        ret
.L156:
        mov     eax, 3
        ret
.L159:
        cmp     rbx, 13
        je      .L163
        mov     eax, 1
        ret
.L163:
        mov     eax, 3
        ret
.L166:
        cmp     rbx, 15
        je      .L170
        mov     eax, 1
        ret
.L170:
        mov     eax, 3
        ret
.L173:
        cmp     rbx, 17
        jl      .L177
        mov     eax, 3
        ret
.L177:
        mov     eax, 1
        ret
@jvanburen jvanburen added the flambda2 Prerequisite for, or part of, flambda2 label Apr 22, 2025
@bclement-ocp
Copy link
Contributor

bclement-ocp commented Apr 30, 2025

Related: #2271

I believe #3219 and the follow-up work for the match-in-match heuristics (should materialize as a PR sometime in May) will provide tools to tackle this robustly, FWIW (if we want to do it at the flambda2 level).

@Gbury
Copy link
Contributor

Gbury commented Apr 30, 2025

I agree that we should try and optimize such patterns for equality function.

That being said, and while waiting for the necessary optimizations to be implemented, here is a pattern that works just as well (if not better, since it also gives you a comparison function), and that happens to be correctly optimized:

type t =
  | A
  | B
  | C
  | D
  | E
  | F

(* the badly-optimized code *)
let equal_unoptimized t t' =
  match t, t' with
  | A, A
  | B, B
  | C, C
  | D, D
  | E, E
  | F, F -> true
  | _ -> false

(* let's define an auxiliary function that returns a "discriminant" that
    distinguishes the various constructors. *)
let discr = function
  | A -> 0
  | B -> 1
  | C -> 2
  | D -> 3
  | E -> 4
  | F -> 5

let compare t t' =
  (discr t) - (discr t')

let equal_optimized t t' = compare t t' = 0

(* an even better one, though we should try and optimize things so that it's not that different from the one before *)
let equal_best t t' = (discr t) = (discr t')

That being said, this pattern is currently quite fragile and only works really well if the discr function actually is the identity (which flambda2 is capable of recognizing). In theory it should work almost as well as long as the discr function is affine (rather than the identity), but there are a few caveats in the current code which prevent full optimization, notably:

  • flambda2 optimizes some switch into array reads
  • this prevents the cmm optimization from detecting discr as an affine function
  • the cmm optimization for affine function does not correctly handle jumps (i.e. Cexit) to the same handler, and therefore does not trigger on the inlined version of discr in compare.

@bclement-ocp I wonder (or rather do not recall) if your upcoming work on the typing env and match-in-match related heuristics would actually see that the affine relation in these switches ?

@bclement-ocp
Copy link
Contributor

@bclement-ocp I wonder (or rather do not recall) if your upcoming work on the typing env and match-in-match related heuristics would actually see that the affine relation in these switches ?

In the current plan we'll see an array, but it shouldn't be too hard to convert to an affine relation (in fact we could do this in simplify_switch already if we wanted). As I mentioned with this work we should also be very close to being able to handle the original match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flambda2 Prerequisite for, or part of, flambda2
Projects
None yet
Development

No branches or pull requests

3 participants