Skip to content

[Feature]: Allow usage of continue_final_message in /embeddings endpoint #31440

@kevin-pw

Description

@kevin-pw

🚀 The feature, motivation and pitch

Problem Statement

When generating embeddings with chat-based models, the /embeddings endpoint does not currently support the addition of a prefilled assistant message. This lack of support limits the usefulness of the /embeddings endpoint because we have found that prefilling the assistant message improves performance on select retrieval tasks.

Feature Description

Congruent to the parameter accepted by the /chat/completions endpoint, the requested feature would add a continue_final_message parameter to the parameters accepted by the /embeddings endpoint such that the messages object could contain a final assistant message that has been partially filled and would render to, for example: <|im_start|>assistant\nBased on the evidence provided, I conclude that .

Feature Benefits

  • Better control over embedding behaviour resulting in improved retrieval task performance.
  • More consistent API design matches the /chat/completions endpoint.

Alternatives

We currently hardcode the prefilled assistant message into custom jinja chat templates, but this workaround requires that we create separate custom chat templates for each model. Enabling the continue_final_message parameter would eliminate the complexity of maintaining those custom chat templates.

Additional context

This feature request was inspired by feature request #23923, which has already achieved implementation of the add_generation_prompt parameter into the /embeddings endpoint. The request to support chat_template_kwargs is still open.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions