-
-
Notifications
You must be signed in to change notification settings - Fork 12.3k
Description
🚀 The feature, motivation and pitch
Problem Statement
When generating embeddings with chat-based models, the /embeddings endpoint does not currently support the addition of a prefilled assistant message. This lack of support limits the usefulness of the /embeddings endpoint because we have found that prefilling the assistant message improves performance on select retrieval tasks.
Feature Description
Congruent to the parameter accepted by the /chat/completions endpoint, the requested feature would add a continue_final_message parameter to the parameters accepted by the /embeddings endpoint such that the messages object could contain a final assistant message that has been partially filled and would render to, for example: <|im_start|>assistant\nBased on the evidence provided, I conclude that .
Feature Benefits
- Better control over embedding behaviour resulting in improved retrieval task performance.
- More consistent API design matches the
/chat/completionsendpoint.
Alternatives
We currently hardcode the prefilled assistant message into custom jinja chat templates, but this workaround requires that we create separate custom chat templates for each model. Enabling the continue_final_message parameter would eliminate the complexity of maintaining those custom chat templates.
Additional context
This feature request was inspired by feature request #23923, which has already achieved implementation of the add_generation_prompt parameter into the /embeddings endpoint. The request to support chat_template_kwargs is still open.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.