Skip to content

mteb

vespa.evaluation._mteb

VespaMTEBApp(previous_results=None, port=8080, **kwargs)

Bases: SearchProtocol

Vespa search using pyvespa with support for Docker and Vespa Cloud deployments.

cleanup()

Stop and remove the Vespa deployment (Docker container or Cloud instance).

ensure_clean_state()

Ensure a clean state before starting a new task.

For Docker: destroys the container and creates a new one. For Cloud: deletes all documents but keeps the deployment running (faster).

is_already_fed(task_name)

Check if the index has already been fed for this model config(s) and task.

VespaMTEBEvaluator(model_configs, task_name=None, benchmark_name=None, results_dir='results', overwrite=False, deployment_target='cloud', port=8080, tenant=None, application=None, instance='default', key_content=None, key_location=None, auto_cleanup=True)

Evaluator class for running MTEB benchmarks with Vespa.

This class handles the orchestration of MTEB evaluation tasks using Vespa as the search backend. It supports single tasks or full benchmarks, with incremental result saving and optional overwrite control.

Supports both Docker (local) and Vespa Cloud deployments via the deployment_target parameter.

Parameters:

Name Type Description Default
model_configs ModelConfig | str | List[ModelConfig | str]

One or more ModelConfig instances or model name strings.

required
task_name str | None

Name of a single MTEB task to evaluate (mutually exclusive with benchmark_name).

None
benchmark_name str | None

Name of an MTEB benchmark to evaluate (mutually exclusive with task_name).

None
results_dir str | Path

Directory to save results. Defaults to "results".

'results'
overwrite bool

If False, skip evaluations where results already exist. Defaults to False.

False
deployment_target Literal['docker', 'cloud']

Where to deploy Vespa. Either "docker" or "cloud". Defaults to "cloud".

'cloud'
port int

Vespa application port (Docker only). Defaults to 8080.

8080
tenant str | None

Vespa Cloud tenant name. Required when deployment_target="cloud".

None
application str | None

Vespa Cloud application name. Defaults to benchmark/task name if not specified.

None
instance str

Vespa Cloud instance name. Defaults to "default".

'default'
key_content str | None

Vespa Cloud API key content (string).

None
key_location str | None

Path to Vespa Cloud API key file.

None
auto_cleanup bool

Whether to delete cloud deployment after evaluation. Defaults to True.

True

Example (Docker): >>> evaluator = VespaMTEBEvaluator( ... model_configs="e5-small-v2", ... benchmark_name="NanoBEIR", ... deployment_target="docker", ... ) >>> evaluator.evaluate()

Example (Vespa Cloud): >>> evaluator = VespaMTEBEvaluator( ... model_configs="e5-small-v2", ... benchmark_name="NanoBEIR", ... deployment_target="cloud", ... tenant="my-tenant", ... key_content=os.getenv("VESPA_API_KEY"), ... ) >>> evaluator.evaluate()

cleanup()

Clean up the Vespa app instance if one exists.

get_model_meta()

Get the MTEB ModelMeta for this evaluator's configuration.

Returns:

Type Description
ModelMeta

ModelMeta instance configured for Vespa search.

evaluate()

Run the MTEB evaluation.

Returns:

Name Type Description
dict dict

The benchmark results including metadata and scores for all task/query_function combinations.