mteb
vespa.evaluation._mteb
VespaMTEBApp(previous_results=None, port=8080, **kwargs)
Bases: SearchProtocol
Vespa search using pyvespa
cleanup()
Stop and remove the Vespa Docker container.
ensure_clean_state()
Ensure a clean state by removing any existing container.
This should be called before starting a new task to ensure no stale data from previous tasks remains in the index.
is_already_fed(task_name)
Check if the index has already been fed for this model config(s) and task.
VespaMTEBEvaluator(model_configs, task_name=None, benchmark_name=None, results_dir='results', overwrite=False, url='http://localhost', port=8080)
Evaluator class for running MTEB benchmarks with Vespa.
This class handles the orchestration of MTEB evaluation tasks using Vespa as the search backend. It supports single tasks or full benchmarks, with incremental result saving and optional overwrite control.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_configs
|
ModelConfig | str | List[ModelConfig | str]
|
One or more ModelConfig instances or model name strings. |
required |
task_name
|
str | None
|
Name of a single MTEB task to evaluate (mutually exclusive with benchmark_name). |
None
|
benchmark_name
|
str | None
|
Name of an MTEB benchmark to evaluate (mutually exclusive with task_name). |
None
|
results_dir
|
str | Path
|
Directory to save results. Defaults to "results". |
'results'
|
overwrite
|
bool
|
If False, skip evaluations where results already exist. Defaults to False. |
False
|
url
|
str
|
Vespa application URL. Defaults to "http://localhost". |
'http://localhost'
|
port
|
int
|
Vespa application port. Defaults to 8080. |
8080
|
Example
evaluator = VespaMTEBEvaluator( ... model_configs="e5-small-v2", ... benchmark_name="NanoBEIR", ... overwrite=False, ... ) evaluator.evaluate()
cleanup()
Clean up the Vespa app instance if one exists.
get_model_meta()
Get the MTEB ModelMeta for this evaluator's configuration.
Returns:
| Type | Description |
|---|---|
ModelMeta
|
ModelMeta instance configured for Vespa search. |
evaluate()
Run the MTEB evaluation.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The benchmark results including metadata and scores for all task/query_function combinations. |