query

Reference API related to QueryModel related code

Match Filters


source

MatchFilter

 MatchFilter ()

Abstract class for match filters.


source

AND

 AND ()

Filter that match document containing all the query terms.

Usage: The AND filter is usually used when specifying query models.

and_filter = AND()

source

OR

 OR ()

Filter that match any document containing at least one query term.

Usage: The OR filter is usually used when specifying query models.

or_filter = OR()

source

WeakAnd

 WeakAnd (hits:int, field:str='default')

Match documents according to the weakAND algorithm.

Reference: https://docs.vespa.ai/en/using-wand-with-vespa.html

Type Default Details
hits int Lower bound on the number of hits to be retrieved.
field str default Which Vespa field to search.
Returns None

Usage: The WeakAnd filter is usually used when specifying query models.

weakand_filter = WeakAnd(hits=10, field="default")

source

Tokenize

 Tokenize (hits:int, field:str='default')

Match documents according to the weakAND algorithm without parsing specials characters.

Reference: https://docs.vespa.ai/en/reference/simple-query-language-reference.html

Type Default Details
hits int Lower bound on the number of hits to be retrieved.
field str default Which Vespa field to search.
Returns None

Usage: The Tokenize filter is usually used when specifying query models.

tokenize_filter = Tokenize(hits=10, field="default")

source

ANN

 ANN (doc_vector:str, query_vector:str, hits:int, label:str,
      approximate:bool=True)

Match documents according to the nearest neighbor operator.

Reference: https://docs.vespa.ai/en/reference/query-language-reference.html

Type Default Details
doc_vector str Name of the document field to be used in the distance calculation.
query_vector str Name of the query field to be used in the distance calculation.
hits int Lower bound on the number of hits to return.
label str A label to identify this specific operator instance.
approximate bool True True to use approximate nearest neighbor and False to use brute force. Default to True.
Returns None

Usage: The ANN filter is usually used when specifying query models.

By default, the ANN operator uses approximate nearest neighbor:

match_filter = ANN(
    doc_vector="doc_vector",
    query_vector="query_vector",
    hits=10,
    label="label",
)

Brute-force can be used by specifying approximate=False:

ann_filter = ANN(
    doc_vector="doc_vector",
    query_vector="query_vector",
    hits=10,
    label="label",
    approximate=False,
)

source

Union

 Union (*args:__main__.MatchFilter)

Match documents that belongs to the union of many match filters.

Type Details
args MatchFilter
Returns None Match filters to be taken the union of.

Usage: The Union filter is usually used when specifying query models.

union_filter = Union(
    WeakAnd(hits=10, field="field_name"),
    ANN(
        doc_vector="doc_vector",
        query_vector="query_vector",
        hits=10,
        label="label",
    ),
)

Ranking


source

Ranking

 Ranking (name:str='default', list_features:bool=False)

Define the rank profile to be used during ranking.

Type Default Details
name str default Name of the rank profile as defined in a Vespa search definition.
list_features bool False Should the ranking features be returned. Either ‘true’ or ‘false’.
Returns None

Usage: Ranking is usually used when specifying query models.

ranking = Ranking(name="bm25", list_features=True)

Query properties


source

QueryProperty

 QueryProperty ()

Abstract class for query property.


source

QueryRankingFeature

 QueryRankingFeature (name:str, mapping:Callable[[str],List[float]])

Include ranking.feature.query into a Vespa query.

Type Details
name str Name of the feature.
mapping typing.Callable[[str], typing.List[float]] Function mapping a string to a list of floats.
Returns None

Usage: QueryRankingFeature is usually used when specifying query models.

query_property = QueryRankingFeature(
    name="query_vector", mapping=lambda x: [1, 2, 3]
)

Query model


source

QueryModel

 QueryModel (name:str='default_name',
             query_properties:Optional[List[__main__.QueryProperty]]=None,
             match_phase:__main__.MatchFilter=<__main__.AND object at
             0x7fe734343a30>, ranking:__main__.Ranking=<__main__.Ranking
             object at 0x7fe73305ba60>,
             body_function:Optional[Callable[[str],Dict]]=None)

Define a query model.

A QueryModel is an abstraction that encapsulates all the relevant information controlling how a Vespa app matches and ranks documents.

Type Default Details
name str default_name Name of the query model. Used to tag model-related quantities, like evaluation metrics.
query_properties typing.Optional[typing.List[main.QueryProperty]] None Query properties to be included in the queries.
match_phase MatchFilter <main.AND object at 0x7fe734343a30> Define the match criteria.
ranking Ranking <main.Ranking object at 0x7fe73305ba60> Define the rank criteria.
body_function typing.Optional[typing.Callable[[str], typing.Dict]] None Function that take query as parameter and returns the body of a Vespa query.
Returns None

Usage:

Specify a query model with default configurations:

query_model = QueryModel()

Specify match phase, ranking phase and properties used by them.

query_model = QueryModel(
    query_properties=[
        QueryRankingFeature(name="query_embedding", mapping=lambda x: [1, 2, 3])
    ],
    match_phase=ANN(
        doc_vector="document_embedding",
        query_vector="query_embedding",
        hits=10,
        label="label",
    ),
    ranking=Ranking(name="bm25_plus_embeddings", list_features=True),
)

Specify a query model based on a function that output Vespa YQL.

def body_function(query):
    body = {
        "yql": "select * from sources * where userQuery();",
        "query": query,
        "type": "any",
        "ranking": {"profile": "bm25", "listFeatures": "true"},
    }
    return body

query_model = QueryModel(body_function=body_function)

Send query with QueryModel


source

send_query

 send_query (app:vespa.application.Vespa, body:Optional[Dict]=None,
             query:Optional[str]=None,
             query_model:Optional[__main__.QueryModel]=None,
             debug_request:bool=False, recall:Optional[Tuple]=None,
             **kwargs)

Send a query request to a Vespa application.

Either send ‘body’ containing all the request parameters or specify ‘query’ and ‘query_model’.

Type Default Details
app Vespa Connection to a Vespa application
body typing.Optional[typing.Dict] None Contains all the request parameters. None when using query_model.
query typing.Optional[str] None Query string. None when using body.
query_model typing.Optional[main.QueryModel] None Query model. None when using body.
debug_request bool False Return request body for debugging instead of sending the request.
recall typing.Optional[typing.Tuple] None Tuple of size 2 where the first element is the name of the field to use to recall and the second element is a list of the values to be recalled.
kwargs
Returns VespaQueryResponse Either the request body if debug_request is True or the result from the Vespa application.

Usage: Assume app is a Vespa connection.

Send request body.

body = {"yql": "select * from sources * where test"}
result = send_query(app=app, body=body)

Use query and query_model:

result = send_query(
    app=app,
    query="this is a test",
    query_model=QueryModel(
        match_phase=OR(), 
        ranking=Ranking()
    ),
    hits=10,
)

Debug the output of the QueryModel by setting debug_request=True:

send_query(
    app=app,
    query="this is a test",
    query_model=QueryModel(match_phase=OR(), ranking=Ranking()),
    debug_request=True,
    hits=10,
).request_body
{'yql': 'select * from sources * where ({grammar: "any"}userInput("this is a test"));',
 'ranking': {'profile': 'default', 'listFeatures': 'false'},
 'hits': 10}

Recall documents using the id field:

result = send_query(
    app=app,
    query="this is a test",
    query_model=QueryModel(match_phase=OR(), ranking=Ranking()),
    hits=10,
    recall=("id", [1, 5]),
)

Use a body_function to specify a QueryModel:

def body_function(query):
    body = {
        "yql": "select * from sources * where userQuery();",
        "query": query,
        "type": "any",
        "ranking": {"profile": "bm25", "listFeatures": "true"},
    }
    return body

query_model = QueryModel(body_function=body_function)

result = send_query(
        app=app,
        query="this is a test",
        query_model=query_model,
        hits=10
)

source

send_query_batch

 send_query_batch (app, body_batch:Optional[List[Dict]]=None,
                   query_batch:Optional[List[str]]=None,
                   query_model:Optional[__main__.QueryModel]=None,
                   recall_batch:Optional[List[Tuple]]=None,
                   asynchronous=True, connections:Optional[int]=100,
                   total_timeout:int=100, **kwargs)

Send queries in batch to a Vespa app.

Type Default Details
app Connection to a Vespa application
body_batch typing.Optional[typing.List[typing.Dict]] None Contains all the request parameters. Set to None if using ‘query_batch’.
query_batch typing.Optional[typing.List[str]] None Query strings. Set to None if using ‘body_batch’.
query_model typing.Optional[main.QueryModel] None Query model to use when sending query strings. Set to None if using ‘body_batch’.
recall_batch typing.Optional[typing.List[typing.Tuple]] None One tuple for each query. Tuple of size 2 where the first element is the name of the field to use to recall and the second element is a list of the values to be recalled.
asynchronous bool True Set True to send data in async mode. Default to True.
connections typing.Optional[int] 100 Number of allowed concurrent connections, valid only if asynchronous=True.
total_timeout int 100 Total timeout in secs for each of the concurrent requests when using asynchronous=True.
kwargs
Returns typing.List[vespa.io.VespaQueryResponse] HTTP POST responses.

Use body_batch to send a batch of body requests.

body_batch = [
    {"yql": "select * from sources * where test"},
    {"yql": "select * from sources * where test2"}
]
result = send_query_batch(app=app, body_batch=body_batch)

Use query_batch to send a batch of query strings to be ranked according a QueryModel.

result = send_query_batch(
    app=app,
    query_batch=["this is a test", "this is a test 2"],
    query_model=QueryModel(
        match_phase=OR(), 
        ranking=Ranking()
    ),
    hits=10,
)

Use recall_batch to send one tuple for each query in query_batch.

result = send_query_batch(
    app=app,
    query_batch=["this is a test", "this is a test 2"],
    query_model=QueryModel(match_phase=OR(), ranking=Ranking()),
    hits=10,
    recall_batch=[("doc_id", [2, 7]), ("doc_id", [0, 5])],
)

Collect Vespa features


source

collect_vespa_features

 collect_vespa_features (app:vespa.application.Vespa, labeled_data,
                         id_field:str, query_model:__main__.QueryModel,
                         number_additional_docs:int, fields:List[str],
                         keep_features:Optional[List[str]]=None,
                         relevant_score:int=1, default_score:int=0,
                         **kwargs)

Collect Vespa features based on a set of labelled data.

Type Default Details
app Vespa Connection to a Vespa application.
labeled_data Labelled data containing query, query_id and relevant ids. See examples about data format.
id_field str The Vespa field representing the document id.
query_model QueryModel Query model.
number_additional_docs int Number of additional documents to retrieve for each relevant document. Duplicate documents will be dropped.
fields typing.List[str] Vespa fields to collect, e.g. [“rankfeatures”, “summaryfeatures”]
keep_features typing.Optional[typing.List[str]] None List containing the names of the features that should be returned. Default to None, which return all the features contained in the ‘fields’ argument.
relevant_score int 1 Score to assign to relevant documents. Default to 1.
default_score int 0 Score to assign to the additional documents that are not relevant. Default to 0.
kwargs
Returns DataFrame DataFrame containing document id (document_id), query id (query_id), scores (relevant) and vespa rank features returned by the Query model RankProfile used.

Usage:

Define labeled_data as a list of dict containing relevant documents:

labeled_data = [
    {
        "query_id": 0,
        "query": "give me title 1",
        "relevant_docs": [{"id": "1", "score": 1}],
    },
    {
        "query_id": 1,
        "query": "give me title 3",
        "relevant_docs": [{"id": "3", "score": 1}],
    },
]

Collect vespa features:

rank_features = collect_vespa_features(
    app=app,
    labeled_data=labeled_data,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), 
        ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures"],
)
rank_features
document_id query_id label attributeMatch(doc_id) attributeMatch(doc_id).averageWeight attributeMatch(doc_id).completeness attributeMatch(doc_id).fieldCompleteness attributeMatch(doc_id).importance attributeMatch(doc_id).matches attributeMatch(doc_id).maxWeight ... term(3).significance term(3).weight term(4).connectedness term(4).significance term(4).weight textSimilarity(text).fieldCoverage textSimilarity(text).order textSimilarity(text).proximity textSimilarity(text).queryCoverage textSimilarity(text).score
0 1 0 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.50 1.0 1.000000 0.50 0.750000
3 7 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.25 0.0 0.859375 0.25 0.425781
1 3 1 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.50 1.0 1.000000 0.50 0.750000
5 7 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.25 0.0 0.859375 0.25 0.425781

4 rows × 94 columns

Use a DataFrame for labeled_data instead of a list of dict:

labeled_data = [
    {
        "qid": 0,
        "query": "give me title 1",
        "doc_id": 1, 
        "relevance": 1
    },
    {
        "qid": 1,
        "query": "give me title 3",
        "doc_id": 3, 
        "relevance": 1
    },
]
labeled_data_df = DataFrame.from_records(labeled_data)
labeled_data_df
qid query doc_id relevance
0 0 give me title 1 1 1
1 1 give me title 3 3 1
rank_features = collect_vespa_features(
    app=app,
    labeled_data=labeled_data_df,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures"],
)
rank_features
document_id query_id label attributeMatch(doc_id) attributeMatch(doc_id).averageWeight attributeMatch(doc_id).completeness attributeMatch(doc_id).fieldCompleteness attributeMatch(doc_id).importance attributeMatch(doc_id).matches attributeMatch(doc_id).maxWeight ... term(3).significance term(3).weight term(4).connectedness term(4).significance term(4).weight textSimilarity(text).fieldCoverage textSimilarity(text).order textSimilarity(text).proximity textSimilarity(text).queryCoverage textSimilarity(text).score
0 1 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.50 1.0 1.000000 0.50 0.750000
3 7 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.25 0.0 0.859375 0.25 0.425781
1 3 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.50 1.0 1.000000 0.50 0.750000
5 7 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.583333 100.0 0.0 0.0 0.0 0.25 0.0 0.859375 0.25 0.425781

4 rows × 94 columns

Keep only selected features by specifying their names in the keep_features argument:

rank_features = collect_vespa_features(
    app=app,
    labeled_data=labeled_data_df,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures"],
    keep_features=["textSimilarity(text).score"],
)
rank_features
document_id query_id label textSimilarity(text).score
0 1 0 0 0.750000
3 7 0 0 0.425781
1 3 1 0 0.750000
5 7 1 0 0.425781

source

store_vespa_features

 store_vespa_features (app:vespa.application.Vespa, output_file_path:str,
                       labeled_data, id_field:str,
                       query_model:__main__.QueryModel,
                       number_additional_docs:int, fields:List[str],
                       keep_features:Optional[List[str]]=None,
                       relevant_score:int=1, default_score:int=0,
                       batch_size=1000, **kwargs)

Retrieve Vespa rank features and store them in a .csv file.

Type Default Details
app Vespa Connection to a Vespa application.
output_file_path str Path of the .csv output file. It will create the file of it does not exist and append the vespa features to an pre-existing file.
labeled_data Labelled data containing query, query_id and relevant ids. See details about data format.
id_field str The Vespa field representing the document id.
query_model QueryModel Query model.
number_additional_docs int Number of additional documents to retrieve for each relevant document.
fields typing.List[str] List of Vespa fields to collect, e.g. [“rankfeatures”, “summaryfeatures”]
keep_features typing.Optional[typing.List[str]] None List containing the names of the features that should be returned. Default to None, which return all the features contained in the ‘fields’ argument.
relevant_score int 1 Score to assign to relevant documents.
default_score int 0 Score to assign to the additional documents that are not relevant.
batch_size int 1000 The size of the batch of labeled data points to be processed.
kwargs
Returns int returns 0 upon success.

Usage:

labeled_data = [
    {
        "query_id": 0,
        "query": "give me title 1",
        "relevant_docs": [{"id": "1", "score": 1}],
    },
    {
        "query_id": 1,
        "query": "give me title 3",
        "relevant_docs": [{"id": "3", "score": 1}],
    },
]

store_vespa_features(
    app=app,
    output_file_path="vespa_features.csv",
    labeled_data=labeled_data,
    id_field="doc_id",
    query_model=QueryModel(
        match_phase=OR(), ranking=Ranking(name="bm25", list_features=True)
    ),
    number_additional_docs=2,
    fields=["rankfeatures", "summaryfeatures"],
)
rank_features = read_csv("vespa_features.csv")
rank_features
Rows collected: 4.
Batch progress: 1/1.
document_id query_id label attributeMatch(doc_id) attributeMatch(doc_id).averageWeight attributeMatch(doc_id).completeness attributeMatch(doc_id).fieldCompleteness attributeMatch(doc_id).importance attributeMatch(doc_id).matches attributeMatch(doc_id).maxWeight ... term(3).weight term(4).connectedness term(4).significance term(4).weight textSimilarity(text).fieldCoverage textSimilarity(text).order textSimilarity(text).proximity textSimilarity(text).queryCoverage textSimilarity(text).score vespa.summaryFeatures.cached
0 1 0 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 100.0 0.0 0.0 0.0 0.50 1.0 1.000000 0.50 0.750000 0.0
1 7 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 100.0 0.0 0.0 0.0 0.25 0.0 0.859375 0.25 0.425781 0.0
2 3 1 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 100.0 0.0 0.0 0.0 0.50 1.0 1.000000 0.50 0.750000 0.0
3 7 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 100.0 0.0 0.0 0.0 0.25 0.0 0.859375 0.25 0.425781 0.0

4 rows × 95 columns