Query models

from learntorank.query import QueryModel, Ranking, OR

standard_query_model = QueryModel(
    name="or_bm25",
    match_phase = OR(),
    ranking = Ranking(name="bm25")
)

Starting in version 0.5.0 we can bypass the pyvespa high-level API and create a QueryModel with the full flexibility of the Vespa Query API. This is useful for use cases not covered by the pyvespa API and for users that are familiar with and prefer to work with the Vespa Query API.

def body_function(query):
    body = {'yql': 'select * from sources * where userQuery();',
            'query': query,
            'type': 'any',
            'ranking': {'profile': 'bm25', 'listFeatures': 'false'}}
    return body

flexible_query_model = QueryModel(body_function = body_function)

The flexible_query_model defined above is equivalent to the standard_query_model, as we can see when querying the app. We will use the cord19 app in our demonstration.

from vespa.application import Vespa

app = Vespa(url = "https://api.cord19.vespa.ai")

from learntorank.query import send_query

standard_result = send_query(
    app=app, 
    query="this is a test", 
    query_model=standard_query_model
)
standard_result.get_hits().head(3)

flexible_result = send_query(
    app=app, 
    query="this is a test", 
    query_model=flexible_query_model
)
flexible_result.get_hits().head(3)

Specify a query model

Query + term-matching + rank profile

from learntorank.query import QueryModel, OR, Ranking, send_query

results = send_query(
    app=app,
    query="Is remdesivir an effective treatment for COVID-19?", 
    query_model = QueryModel(
        match_phase=OR(), 
        ranking=Ranking(name="bm25")
    )
)

results.number_documents_retrieved

Query + term-matching + ann operator + rank_profile

from learntorank.query import QueryModel, QueryRankingFeature, ANN, WeakAnd, Union, Ranking
from random import random

match_phase = Union(
    WeakAnd(hits = 10), 
    ANN(
        doc_vector="specter_embedding", 
        query_vector="specter_vector", 
        hits = 10,
        label="title"
    )
)
ranking = Ranking(name="related-specter", list_features=True)
query_model = QueryModel(
    query_properties=[QueryRankingFeature(
        name="specter_vector", 
        mapping=lambda x: [random() for x in range(768)]
    )],
    match_phase=match_phase, ranking=ranking
)

results = send_query(
    app=app,
    query="Is remdesivir an effective treatment for COVID-19?", 
    query_model=query_model
)

results.number_documents_retrieved

Recall specific documents

Let’s take a look at the top 3 ids from the last query.

top_ids = [hit["fields"]["id"] for hit in results.hits[0:3]]
top_ids

Assume that we now want to retrieve the second and third ids above. We can do so with the recall argument.

results_with_recall = send_query(
    app=app,
    query="Is remdesivir an effective treatment for COVID-19?", 
    query_model=query_model,
    recall = ("id", top_ids[1:3])
)

It will only retrieve the documents with Vespa field id that is defined on the list that is inside the tuple.

id_recalled = [hit["fields"]["id"] for hit in results_with_recall.hits]
id_recalled