Skip to content

mteb

vespa.evaluation._mteb

VespaMTEBApp(previous_results=None, port=8080, **kwargs)

Bases: SearchProtocol

Vespa search using pyvespa

cleanup()

Stop and remove the Vespa Docker container.

ensure_clean_state()

Ensure a clean state by removing any existing container.

This should be called before starting a new task to ensure no stale data from previous tasks remains in the index.

is_already_fed(task_name)

Check if the index has already been fed for this model config(s) and task.

VespaMTEBEvaluator(model_configs, task_name=None, benchmark_name=None, results_dir='results', overwrite=False, url='http://localhost', port=8080)

Evaluator class for running MTEB benchmarks with Vespa.

This class handles the orchestration of MTEB evaluation tasks using Vespa as the search backend. It supports single tasks or full benchmarks, with incremental result saving and optional overwrite control.

Parameters:

Name Type Description Default
model_configs ModelConfig | str | List[ModelConfig | str]

One or more ModelConfig instances or model name strings.

required
task_name str | None

Name of a single MTEB task to evaluate (mutually exclusive with benchmark_name).

None
benchmark_name str | None

Name of an MTEB benchmark to evaluate (mutually exclusive with task_name).

None
results_dir str | Path

Directory to save results. Defaults to "results".

'results'
overwrite bool

If False, skip evaluations where results already exist. Defaults to False.

False
url str

Vespa application URL. Defaults to "http://localhost".

'http://localhost'
port int

Vespa application port. Defaults to 8080.

8080
Example

evaluator = VespaMTEBEvaluator( ... model_configs="e5-small-v2", ... benchmark_name="NanoBEIR", ... overwrite=False, ... ) evaluator.evaluate()

cleanup()

Clean up the Vespa app instance if one exists.

get_model_meta()

Get the MTEB ModelMeta for this evaluator's configuration.

Returns:

Type Description
ModelMeta

ModelMeta instance configured for Vespa search.

evaluate()

Run the MTEB evaluation.

Returns:

Name Type Description
dict dict

The benchmark results including metadata and scores for all task/query_function combinations.