query

Reference API related to QueryModel related code

Match Filters

MatchFilter

 MatchFilter ()

Abstract class for match filters.

source

AND

 AND ()

Filter that match document containing all the query terms.

Usage: The AND filter is usually used when specifying query models.

and_filter = AND()

source

OR

 OR ()

Filter that match any document containing at least one query term.

Usage: The OR filter is usually used when specifying query models.

or_filter = OR()

source

WeakAnd

 WeakAnd (hits:int, field:str='default')

Match documents according to the weakAND algorithm.

Reference: https://docs.vespa.ai/en/using-wand-with-vespa.html

	Type	Default	Details
hits	int		Lower bound on the number of hits to be retrieved.
field	str	default	Which Vespa field to search.
Returns	None

Usage: The WeakAnd filter is usually used when specifying query models.

weakand_filter = WeakAnd(hits=10, field="default")

source

Tokenize

 Tokenize (hits:int, field:str='default')

Match documents according to the weakAND algorithm without parsing specials characters.

Reference: https://docs.vespa.ai/en/reference/simple-query-language-reference.html

	Type	Default	Details
hits	int		Lower bound on the number of hits to be retrieved.
field	str	default	Which Vespa field to search.
Returns	None

Usage: The Tokenize filter is usually used when specifying query models.

tokenize_filter = Tokenize(hits=10, field="default")

source

ANN

 ANN (doc_vector:str, query_vector:str, hits:int, label:str,
      approximate:bool=True)

Match documents according to the nearest neighbor operator.

Reference: https://docs.vespa.ai/en/reference/query-language-reference.html

	Type	Default	Details
doc_vector	str		Name of the document field to be used in the distance calculation.
query_vector	str		Name of the query field to be used in the distance calculation.
hits	int		Lower bound on the number of hits to return.
label	str		A label to identify this specific operator instance.
approximate	bool	True	True to use approximate nearest neighbor and False to use brute force. Default to True.
Returns	None

Usage: The ANN filter is usually used when specifying query models.

By default, the ANN operator uses approximate nearest neighbor:

match_filter = ANN(
    doc_vector="doc_vector",
    query_vector="query_vector",
    hits=10,
    label="label",
)

Brute-force can be used by specifying approximate=False:

ann_filter = ANN(
    doc_vector="doc_vector",
    query_vector="query_vector",
    hits=10,
    label="label",
    approximate=False,
)

source

Union

 Union (*args:__main__.MatchFilter)

Match documents that belongs to the union of many match filters.

	Type	Details
args	MatchFilter
Returns	None	Match filters to be taken the union of.

Usage: The Union filter is usually used when specifying query models.

union_filter = Union(
    WeakAnd(hits=10, field="field_name"),
    ANN(
        doc_vector="doc_vector",
        query_vector="query_vector",
        hits=10,
        label="label",
    ),
)

Ranking

source

Ranking

 Ranking (name:str='default', list_features:bool=False)

Define the rank profile to be used during ranking.

	Type	Default	Details
name	str	default	Name of the rank profile as defined in a Vespa search definition.
list_features	bool	False	Should the ranking features be returned. Either ‘true’ or ‘false’.
Returns	None

Usage: Ranking is usually used when specifying query models.

ranking = Ranking(name="bm25", list_features=True)

Query properties

source

QueryProperty

 QueryProperty ()

Abstract class for query property.

source

QueryRankingFeature

 QueryRankingFeature (name:str, mapping:Callable[[str],List[float]])

Include ranking.feature.query into a Vespa query.

	Type	Details
name	str	Name of the feature.
mapping	typing.Callable[[str], typing.List[float]]	Function mapping a string to a list of floats.
Returns	None

Usage: QueryRankingFeature is usually used when specifying query models.

query_property = QueryRankingFeature(
    name="query_vector", mapping=lambda x: [1, 2, 3]
)

Query model

source

QueryModel

 QueryModel (name:str='default_name',
             query_properties:Optional[List[__main__.QueryProperty]]=None,
             match_phase:__main__.MatchFilter=<__main__.AND object at
             0x7fe734343a30>, ranking:__main__.Ranking=<__main__.Ranking
             object at 0x7fe73305ba60>,
             body_function:Optional[Callable[[str],Dict]]=None)

Define a query model.

A QueryModel is an abstraction that encapsulates all the relevant information controlling how a Vespa app matches and ranks documents.

	Type	Default	Details
name	str	default_name	Name of the query model. Used to tag model-related quantities, like evaluation metrics.
query_properties	typing.Optional[typing.List[main.QueryProperty]]	None	Query properties to be included in the queries.
match_phase	MatchFilter	<main.AND object at 0x7fe734343a30>	Define the match criteria.
ranking	Ranking	<main.Ranking object at 0x7fe73305ba60>	Define the rank criteria.
body_function	typing.Optional[typing.Callable[[str], typing.Dict]]	None	Function that take query as parameter and returns the body of a Vespa query.
Returns	None

Usage:

Specify a query model with default configurations:

query_model = QueryModel()

Specify match phase, ranking phase and properties used by them.

query_model = QueryModel(
    query_properties=[
        QueryRankingFeature(name="query_embedding", mapping=lambda x: [1, 2, 3])
    ],
    match_phase=ANN(
        doc_vector="document_embedding",
        query_vector="query_embedding",
        hits=10,
        label="label",
    ),
    ranking=Ranking(name="bm25_plus_embeddings", list_features=True),
)

Specify a query model based on a function that output Vespa YQL.

def body_function(query):
    body = {
        "yql": "select * from sources * where userQuery();",
        "query": query,
        "type": "any",
        "ranking": {"profile": "bm25", "listFeatures": "true"},
    }
    return body

query_model = QueryModel(body_function=body_function)

Send query with QueryModel

source

send_query

 send_query (app:vespa.application.Vespa, body:Optional[Dict]=None,
             query:Optional[str]=None,
             query_model:Optional[__main__.QueryModel]=None,
             debug_request:bool=False, recall:Optional[Tuple]=None,
             **kwargs)

Send a query request to a Vespa application.

Either send ‘body’ containing all the request parameters or specify ‘query’ and ‘query_model’.

	Type	Default	Details
app	Vespa		Connection to a Vespa application
body	typing.Optional[typing.Dict]	None	Contains all the request parameters. None when using `query_model`.
query	typing.Optional[str]	None	Query string. None when using `body`.
query_model	typing.Optional[main.QueryModel]	None	Query model. None when using `body`.
debug_request	bool	False	Return request body for debugging instead of sending the request.
recall	typing.Optional[typing.Tuple]	None	Tuple of size 2 where the first element is the name of the field to use to recall and the second element is a list of the values to be recalled.
kwargs
Returns	VespaQueryResponse		Either the request body if debug_request is True or the result from the Vespa application.

Usage: Assume app is a Vespa connection.

Send request body.

body = {"yql": "select * from sources * where test"}
result = send_query(app=app, body=body)

Use query and query_model:

result = send_query(
    app=app,
    query="this is a test",
    query_model=QueryModel(
        match_phase=OR(), 
        ranking=Ranking()
    ),
    hits=10,
)

Debug the output of the QueryModel by setting debug_request=True:

send_query(
    app=app,
    query="this is a test",
    query_model=QueryModel(match_phase=OR(), ranking=Ranking()),
    debug_request=True,
    hits=10,
).request_body

{'yql': 'select * from sources * where ({grammar: "any"}userInput("this is a test"));',
 'ranking': {'profile': 'default', 'listFeatures': 'false'},
 'hits': 10}

Recall documents using the id field:

result = send_query(
    app=app,
    query="this is a test",
    query_model=QueryModel(match_phase=OR(), ranking=Ranking()),
    hits=10,
    recall=("id", [1, 5]),
)

Use a body_function to specify a QueryModel:

def body_function(query):
    body = {
        "yql": "select * from sources * where userQuery();",
        "query": query,
        "type": "any",
        "ranking": {"profile": "bm25", "listFeatures": "true"},
    }
    return body

query_model = QueryModel(body_function=body_function)

result = send_query(
        app=app,
        query="this is a test",
        query_model=query_model,
        hits=10
)

source

send_query_batch

 send_query_batch (app, body_batch:Optional[List[Dict]]=None,
                   query_batch:Optional[List[str]]=None,
                   query_model:Optional[__main__.QueryModel]=None,
                   recall_batch:Optional[List[Tuple]]=None,
                   asynchronous=True, connections:Optional[int]=100,
                   total_timeout:int=100, **kwargs)

Send queries in batch to a Vespa app.

	Type	Default	Details
app			Connection to a Vespa application
body_batch	typing.Optional[typing.List[typing.Dict]]	None	Contains all the request parameters. Set to None if using ‘query_batch’.
query_batch	typing.Optional[typing.List[str]]	None	Query strings. Set to None if using ‘body_batch’.
query_model	typing.Optional[main.QueryModel]	None	Query model to use when sending query strings. Set to None if using ‘body_batch’.
recall_batch	typing.Optional[typing.List[typing.Tuple]]	None	One tuple for each query. Tuple of size 2 where the first element is the name of the field to use to recall and the second element is a list of the values to be recalled.
asynchronous	bool	True	Set True to send data in async mode. Default to True.
connections	typing.Optional[int]	100	Number of allowed concurrent connections, valid only if `asynchronous=True`.
total_timeout	int	100	Total timeout in secs for each of the concurrent requests when using `asynchronous=True`.
kwargs
Returns	typing.List[vespa.io.VespaQueryResponse]		HTTP POST responses.

Use body_batch to send a batch of body requests.

body_batch = [
    {"yql": "select * from sources * where test"},
    {"yql": "select * from sources * where test2"}
]
result = send_query_batch(app=app, body_batch=body_batch)

Use query_batch to send a batch of query strings to be ranked according a QueryModel.

result = send_query_batch(
    app=app,
    query_batch=["this is a test", "this is a test 2"],
    query_model=QueryModel(
        match_phase=OR(), 
        ranking=Ranking()
    ),
    hits=10,
)

Use recall_batch to send one tuple for each query in query_batch.

result = send_query_batch(
    app=app,
    query_batch=["this is a test", "this is a test 2"],
    query_model=QueryModel(match_phase=OR(), ranking=Ranking()),
    hits=10,
    recall_batch=[("doc_id", [2, 7]), ("doc_id", [0, 5])],
)

Collect Vespa features

source

collect_vespa_features

 collect_vespa_features (app:vespa.application.Vespa, labeled_data,
                         id_field:str, query_model:__main__.QueryModel,
                         number_additional_docs:int, fields:List[str],
                         keep_features:Optional[List[str]]=None,
                         relevant_score:int=1, default_score:int=0,
                         **kwargs)

Collect Vespa features based on a set of labelled data.

	Type	Default	Details
app	Vespa		Connection to a Vespa application.
labeled_data			Labelled data containing query, query_id and relevant ids. See examples about data format.
id_field	str		The Vespa field representing the document id.
query_model	QueryModel		Query model.
number_additional_docs	int		Number of additional documents to retrieve for each relevant document. Duplicate documents will be dropped.
fields	typing.List[str]		Vespa fields to collect, e.g. [“rankfeatures”, “summaryfeatures”]
keep_features	typing.Optional[typing.List[str]]	None	List containing the names of the features that should be returned. Default to None, which return all the features contained in the ‘fields’ argument.
relevant_score	int	1	Score to assign to relevant documents. Default to 1.
default_score	int	0	Score to assign to the additional documents that are not relevant. Default to 0.
kwargs
Returns	DataFrame		DataFrame containing document id (document_id), query id (query_id), scores (relevant) and vespa rank features returned by the Query model RankProfile used.

Usage:

Define labeled_data as a list of dict containing relevant documents:

labeled_data = [
    {
        "query_id": 0,
        "query": "give me title 1",
        "relevant_docs": [{"id": "1", "score": 1}],
    },
    {
        "query_id": 1,
        "query": "give me title 3",
        "relevant_docs": [{"id": "3", "score": 1}],
    },
]

Collect vespa features:

rank_features = collect_vespa_features(
    app=app,
    labeled_data=labeled_data,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), 
        ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures"],
)
rank_features

	document_id	query_id	label	...	term(3).significance	term(3).weight	textSimilarity(text).fieldCoverage	textSimilarity(text).order	textSimilarity(text).proximity	textSimilarity(text).queryCoverage	textSimilarity(text).score
0	1	0	1	...	0.583333	100.0	0.50	1.0	1.000000	0.50	0.750000
3	7	0	0	...	0.583333	100.0	0.25	0.0	0.859375	0.25	0.425781
1	3	1	1	...	0.583333	100.0	0.50	1.0	1.000000	0.50	0.750000
5	7	1	0	...	0.583333	100.0	0.25	0.0	0.859375	0.25	0.425781

4 rows × 94 columns

Use a DataFrame for labeled_data instead of a list of dict:

labeled_data = [
    {
        "qid": 0,
        "query": "give me title 1",
        "doc_id": 1, 
        "relevance": 1
    },
    {
        "qid": 1,
        "query": "give me title 3",
        "doc_id": 3, 
        "relevance": 1
    },
]
labeled_data_df = DataFrame.from_records(labeled_data)
labeled_data_df

	qid	query	doc_id	relevance
0	0	give me title 1	1	1
1	1	give me title 3	3	1

rank_features = collect_vespa_features(
    app=app,
    labeled_data=labeled_data_df,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures"],
)
rank_features

	document_id	query_id	...	term(3).significance	term(3).weight	textSimilarity(text).fieldCoverage	textSimilarity(text).order	textSimilarity(text).proximity	textSimilarity(text).queryCoverage	textSimilarity(text).score
0	1	0	...	0.583333	100.0	0.50	1.0	1.000000	0.50	0.750000
3	7	0	...	0.583333	100.0	0.25	0.0	0.859375	0.25	0.425781
1	3	1	...	0.583333	100.0	0.50	1.0	1.000000	0.50	0.750000
5	7	1	...	0.583333	100.0	0.25	0.0	0.859375	0.25	0.425781

4 rows × 94 columns

Keep only selected features by specifying their names in the keep_features argument:

rank_features = collect_vespa_features(
    app=app,
    labeled_data=labeled_data_df,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures"],
    keep_features=["textSimilarity(text).score"],
)
rank_features

	document_id	query_id	textSimilarity(text).score
0	1	0	0.750000
3	7	0	0.425781
1	3	1	0.750000
5	7	1	0.425781

source

store_vespa_features

 store_vespa_features (app:vespa.application.Vespa, output_file_path:str,
                       labeled_data, id_field:str,
                       query_model:__main__.QueryModel,
                       number_additional_docs:int, fields:List[str],
                       keep_features:Optional[List[str]]=None,
                       relevant_score:int=1, default_score:int=0,
                       batch_size=1000, **kwargs)

Retrieve Vespa rank features and store them in a .csv file.

	Type	Default	Details
app	Vespa		Connection to a Vespa application.
output_file_path	str		Path of the .csv output file. It will create the file of it does not exist and append the vespa features to an pre-existing file.
labeled_data			Labelled data containing query, query_id and relevant ids. See details about data format.
id_field	str		The Vespa field representing the document id.
query_model	QueryModel		Query model.
number_additional_docs	int		Number of additional documents to retrieve for each relevant document.
fields	typing.List[str]		List of Vespa fields to collect, e.g. [“rankfeatures”, “summaryfeatures”]
keep_features	typing.Optional[typing.List[str]]	None	List containing the names of the features that should be returned. Default to None, which return all the features contained in the ‘fields’ argument.
relevant_score	int	1	Score to assign to relevant documents.
default_score	int	0	Score to assign to the additional documents that are not relevant.
batch_size	int	1000	The size of the batch of labeled data points to be processed.
kwargs
Returns	int		returns 0 upon success.

Usage:

labeled_data = [
    {
        "query_id": 0,
        "query": "give me title 1",
        "relevant_docs": [{"id": "1", "score": 1}],
    },
    {
        "query_id": 1,
        "query": "give me title 3",
        "relevant_docs": [{"id": "3", "score": 1}],
    },
]

store_vespa_features(
    app=app,
    output_file_path="vespa_features.csv",
    labeled_data=labeled_data,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures", "summaryfeatures"],
)
rank_features = read_csv("vespa_features.csv")
rank_features

Rows collected: 4.
Batch progress: 1/1.

	document_id	query_id	label	...	term(3).weight	textSimilarity(text).fieldCoverage	textSimilarity(text).order	textSimilarity(text).proximity	textSimilarity(text).queryCoverage	textSimilarity(text).score
0	1	0	1	...	100.0	0.50	1.0	1.000000	0.50	0.750000
1	7	0	0	...	100.0	0.25	0.0	0.859375	0.25	0.425781
2	3	1	1	...	100.0	0.50	1.0	1.000000	0.50	0.750000
3	7	1	0	...	100.0	0.25	0.0	0.859375	0.25	0.425781

4 rows × 95 columns