mteb
vespa.evaluation._mteb
VespaMTEBApp(previous_results=None, port=8080, **kwargs)
Bases: SearchProtocol
Vespa search using pyvespa with support for Docker and Vespa Cloud deployments.
cleanup()
Stop and remove the Vespa deployment (Docker container or Cloud instance).
ensure_clean_state()
Ensure a clean state before starting a new task.
For Docker: destroys the container and creates a new one. For Cloud: deletes all documents but keeps the deployment running (faster).
is_already_fed(task_name)
Check if the index has already been fed for this model config(s) and task.
VespaMTEBEvaluator(model_configs, task_name=None, benchmark_name=None, results_dir='results', overwrite=False, deployment_target='cloud', port=8080, tenant=None, application=None, instance='default', key_content=None, key_location=None, auto_cleanup=True)
Evaluator class for running MTEB benchmarks with Vespa.
This class handles the orchestration of MTEB evaluation tasks using Vespa as the search backend. It supports single tasks or full benchmarks, with incremental result saving and optional overwrite control.
Supports both Docker (local) and Vespa Cloud deployments via the
deployment_target parameter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_configs
|
ModelConfig | str | List[ModelConfig | str]
|
One or more ModelConfig instances or model name strings. |
required |
task_name
|
str | None
|
Name of a single MTEB task to evaluate (mutually exclusive with benchmark_name). |
None
|
benchmark_name
|
str | None
|
Name of an MTEB benchmark to evaluate (mutually exclusive with task_name). |
None
|
results_dir
|
str | Path
|
Directory to save results. Defaults to "results". |
'results'
|
overwrite
|
bool
|
If False, skip evaluations where results already exist. Defaults to False. |
False
|
deployment_target
|
Literal['docker', 'cloud']
|
Where to deploy Vespa. Either "docker" or "cloud". Defaults to "cloud". |
'cloud'
|
port
|
int
|
Vespa application port (Docker only). Defaults to 8080. |
8080
|
tenant
|
str | None
|
Vespa Cloud tenant name. Required when deployment_target="cloud". |
None
|
application
|
str | None
|
Vespa Cloud application name. Defaults to benchmark/task name if not specified. |
None
|
instance
|
str
|
Vespa Cloud instance name. Defaults to "default". |
'default'
|
key_content
|
str | None
|
Vespa Cloud API key content (string). |
None
|
key_location
|
str | None
|
Path to Vespa Cloud API key file. |
None
|
auto_cleanup
|
bool
|
Whether to delete cloud deployment after evaluation. Defaults to True. |
True
|
Example (Docker): >>> evaluator = VespaMTEBEvaluator( ... model_configs="e5-small-v2", ... benchmark_name="NanoBEIR", ... deployment_target="docker", ... ) >>> evaluator.evaluate()
Example (Vespa Cloud): >>> evaluator = VespaMTEBEvaluator( ... model_configs="e5-small-v2", ... benchmark_name="NanoBEIR", ... deployment_target="cloud", ... tenant="my-tenant", ... key_content=os.getenv("VESPA_API_KEY"), ... ) >>> evaluator.evaluate()
cleanup()
Clean up the Vespa app instance if one exists.
get_model_meta()
Get the MTEB ModelMeta for this evaluator's configuration.
Returns:
| Type | Description |
|---|---|
ModelMeta
|
ModelMeta instance configured for Vespa search. |
evaluate()
Run the MTEB evaluation.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The benchmark results including metadata and scores for all task/query_function combinations. |