[Feature]: Allow usage of continue_final_message in /embeddings endpoint

### 🚀 The feature, motivation and pitch

### Problem Statement
When generating embeddings with chat-based models, the `/embeddings` endpoint does not currently support the addition of a prefilled assistant message. This lack of support limits the usefulness of the `/embeddings` endpoint because we have found that prefilling the assistant message improves performance on select retrieval tasks.

### Feature Description
Congruent to the parameter accepted by the `/chat/completions` endpoint, the requested feature would add a `continue_final_message` parameter to the parameters accepted by the `/embeddings` endpoint such that the `messages` object could contain a final `assistant` message that has been partially filled and would render to, for example: `<|im_start|>assistant\nBased on the evidence provided, I conclude that `. 

### Feature Benefits
- Better control over embedding behaviour resulting in improved retrieval task performance.
- More consistent API design matches the `/chat/completions` endpoint.

### Alternatives

We currently hardcode the prefilled assistant message into custom jinja chat templates, but this workaround requires that we create separate custom chat templates for each model. Enabling the `continue_final_message` parameter would eliminate the complexity of maintaining those custom chat templates.

### Additional context

This feature request was inspired by feature request https://github.com/vllm-project/vllm/issues/23923, which has already achieved implementation of the `add_generation_prompt` parameter into the `/embeddings` endpoint. The request to support `chat_template_kwargs` is still open.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Allow usage of continue_final_message in /embeddings endpoint #31440

🚀 The feature, motivation and pitch

Problem Statement

Feature Description

Feature Benefits

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Allow usage of continue_final_message in /embeddings endpoint #31440

Description

🚀 The feature, motivation and pitch

Problem Statement

Feature Description

Feature Benefits

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions