# Pyvespa examples[¶](#pyvespa-examples) This is a notebook with short examples one can build applications from. Refer to [troubleshooting](https://vespa-engine.github.io/pyvespa/troubleshooting.md) for any problem when running this guide. Refer to [troubleshooting](https://vespa-engine.github.io/pyvespa/troubleshooting.md), which also has utilies for debugging. In \[ \]: Copied! ``` !pip3 install pyvespa ``` !pip3 install pyvespa ## Neighbors[¶](#neighbors) Explore distance between points in 3D vector space. These are simple examples, feeding documents with a tensor representing a point in space, and a rank profile calculating the distance between a point in the query and the point in the documents. The examples start with using simple ranking expressions like [euclidean-distance](https://docs.vespa.ai/en/reference/ranking-expressions.html#euclidean-distance-t), then rank features like [closeness()]() and setting different [distance-metrics](https://docs.vespa.ai/en/nearest-neighbor-search.html#distance-metrics-for-nearest-neighbor-search). ### Distant neighbor[¶](#distant-neighbor) First, find the point that is **most** distant from a point in query - deploy the Application Package: In \[14\]: Copied! ``` from vespa.package import ApplicationPackage, Field, RankProfile from vespa.deployment import VespaDocker from vespa.io import VespaResponse app_package = ApplicationPackage(name="neighbors") app_package.schema.add_fields( Field(name="point", type="tensor(d[3])", indexing=["attribute", "summary"]) ) app_package.schema.add_rank_profile( RankProfile( name="max_distance", inputs=[("query(qpoint)", "tensor(d[3])")], first_phase="euclidean_distance(attribute(point), query(qpoint), d)", ) ) vespa_docker = VespaDocker() app = vespa_docker.deploy(application_package=app_package) ``` from vespa.package import ApplicationPackage, Field, RankProfile from vespa.deployment import VespaDocker from vespa.io import VespaResponse app_package = ApplicationPackage(name="neighbors") app_package.schema.add_fields( Field(name="point", type="tensor(d[3])", indexing=["attribute", "summary"]) ) app_package.schema.add_rank_profile( RankProfile( name="max_distance", inputs=\[("query(qpoint)", "tensor(d[3])")\], first_phase="euclidean_distance(attribute(point), query(qpoint), d)", ) ) vespa_docker = VespaDocker() app = vespa_docker.deploy(application_package=app_package) ``` Waiting for configuration server, 0/300 seconds... Waiting for configuration server, 5/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 0/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 5/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 10/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 15/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 20/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 25/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Application is up! Finished deployment. ``` Feed points in 3d space using a 3-dimensional [indexed tensor](https://docs.vespa.ai/en/tensor-user-guide.html). Pyvespa feeds using the [/document/v1/ API](https://docs.vespa.ai/en/reference/document-v1-api-reference.html), refer to [document format](https://docs.vespa.ai/en/reference/document-json-format.html): In \[15\]: Copied! ``` def get_feed(field_name): return [ {"id": 0, "fields": {field_name: [0.0, 1.0, 2.0]}}, {"id": 1, "fields": {field_name: [1.0, 2.0, 3.0]}}, {"id": 2, "fields": {field_name: [2.0, 3.0, 4.0]}}, ] with app.syncio(connections=1) as session: for u in get_feed("point"): response: VespaResponse = session.update_data( data_id=u["id"], schema="neighbors", fields=u["fields"], create=True ) if not response.is_successful(): print( "Update failed for document {}".format(u["id"]) + " with status code {}".format(response.status_code) + " with response {}".format(response.get_json()) ) ``` def get_feed(field_name): return \[ {"id": 0, "fields": {field_name: [0.0, 1.0, 2.0]}}, {"id": 1, "fields": {field_name: [1.0, 2.0, 3.0]}}, {"id": 2, "fields": {field_name: [2.0, 3.0, 4.0]}}, \] with app.syncio(connections=1) as session: for u in get_feed("point"): response: VespaResponse = session.update_data( data_id=u["id"], schema="neighbors", fields=u["fields"], create=True ) if not response.is_successful(): print( "Update failed for document {}".format(u["id"]) - " with status code {}".format(response.status_code) - " with response {}".format(response.get_json()) ) **Note:** The feed above uses [create-if-nonexistent](https://docs.vespa.ai/en/document-v1-api-guide.html#create-if-nonexistent), i.e. update a document, create it if it does not exists. Later in this notebook we will add a field and update it, so using an update to feed data makes it easier. Query from origo using [YQL](https://docs.vespa.ai/en/query-language.html). The rank profile will rank the most distant points highest, here `sqrt(2*2 + 3*3 + 4*4) = 5.385`: In \[16\]: Copied! ``` import json from vespa.io import VespaQueryResponse result: VespaQueryResponse = app.query( body={ "yql": "select point from neighbors where true", "input.query(qpoint)": "[0.0, 0.0, 0.0]", "ranking.profile": "max_distance", "presentation.format.tensors": "short-value", } ) if not response.is_successful(): print( "Query failed with status code {}".format(response.status_code) + " with response {}".format(response.get_json()) ) raise Exception("Query failed") if len(result.hits) != 3: raise Exception("Expected 3 hits, got {}".format(len(result.hits))) print(json.dumps(result.hits, indent=4)) ``` import json from vespa.io import VespaQueryResponse result: VespaQueryResponse = app.query( body={ "yql": "select point from neighbors where true", "input.query(qpoint)": "[0.0, 0.0, 0.0]", "ranking.profile": "max_distance", "presentation.format.tensors": "short-value", } ) if not response.is_successful(): print( "Query failed with status code {}".format(response.status_code) - " with response {}".format(response.get_json()) ) raise Exception("Query failed") if len(result.hits) != 3: raise Exception("Expected 3 hits, got {}".format(len(result.hits))) print(json.dumps(result.hits, indent=4)) ``` [ { "id": "index:neighbors_content/0/c81e728dfde15fa4e8dfb3d3", "relevance": 5.385164807134504, "source": "neighbors_content", "fields": { "point": [ 2.0, 3.0, 4.0 ] } }, { "id": "index:neighbors_content/0/c4ca4238db266f395150e961", "relevance": 3.7416573867739413, "source": "neighbors_content", "fields": { "point": [ 1.0, 2.0, 3.0 ] } }, { "id": "index:neighbors_content/0/cfcd20845b10b1420c6cdeca", "relevance": 2.23606797749979, "source": "neighbors_content", "fields": { "point": [ 0.0, 1.0, 2.0 ] } } ] ``` Query from `[1.0, 2.0, 2.9]` - find that `[2.0, 3.0, 4.0]` is most distant: In \[17\]: Copied! ``` result = app.query( body={ "yql": "select point from neighbors where true", "input.query(qpoint)": "[1.0, 2.0, 2.9]", "ranking.profile": "max_distance", "presentation.format.tensors": "short-value", } ) print(json.dumps(result.hits, indent=4)) ``` result = app.query( body={ "yql": "select point from neighbors where true", "input.query(qpoint)": "[1.0, 2.0, 2.9]", "ranking.profile": "max_distance", "presentation.format.tensors": "short-value", } ) print(json.dumps(result.hits, indent=4)) ``` [ { "id": "index:neighbors_content/0/c81e728dfde15fa4e8dfb3d3", "relevance": 1.7916472308265357, "source": "neighbors_content", "fields": { "point": [ 2.0, 3.0, 4.0 ] } }, { "id": "index:neighbors_content/0/cfcd20845b10b1420c6cdeca", "relevance": 1.6763055154708881, "source": "neighbors_content", "fields": { "point": [ 0.0, 1.0, 2.0 ] } }, { "id": "index:neighbors_content/0/c4ca4238db266f395150e961", "relevance": 0.09999990575011103, "source": "neighbors_content", "fields": { "point": [ 1.0, 2.0, 3.0 ] } } ] ``` ### Nearest neighbor[¶](#nearest-neighbor) The [nearestNeighbor](https://docs.vespa.ai/en/reference/query-language-reference.html#nearestneighbor) query operator calculates distances between points in vector space. Here, we are using the default distance metric (euclidean), as it is not specified. The [closeness()]() rank feature can be used to rank results - add a new rank profile: In \[18\]: Copied! ``` app_package.schema.add_rank_profile( RankProfile( name="nearest_neighbor", inputs=[("query(qpoint)", "tensor(d[3])")], first_phase="closeness(field, point)", ) ) app = vespa_docker.deploy(application_package=app_package) ``` app_package.schema.add_rank_profile( RankProfile( name="nearest_neighbor", inputs=\[("query(qpoint)", "tensor(d[3])")\], first_phase="closeness(field, point)", ) ) app = vespa_docker.deploy(application_package=app_package) ``` Waiting for configuration server, 0/300 seconds... Waiting for configuration server, 5/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 0/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 5/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Application is up! Finished deployment. ``` Read more in [nearest neighbor search](https://docs.vespa.ai/en/nearest-neighbor-search.html). Query using nearestNeighbor query operator: In \[19\]: Copied! ``` result = app.query( body={ "yql": "select point from neighbors where {targetHits: 3}nearestNeighbor(point, qpoint)", "input.query(qpoint)": "[1.0, 2.0, 2.9]", "ranking.profile": "nearest_neighbor", "presentation.format.tensors": "short-value", } ) print(json.dumps(result.hits, indent=4)) ``` result = app.query( body={ "yql": "select point from neighbors where {targetHits: 3}nearestNeighbor(point, qpoint)", "input.query(qpoint)": "[1.0, 2.0, 2.9]", "ranking.profile": "nearest_neighbor", "presentation.format.tensors": "short-value", } ) print(json.dumps(result.hits, indent=4)) ``` [ { "id": "index:neighbors_content/0/c4ca4238db266f395150e961", "relevance": 0.9090909879069752, "source": "neighbors_content", "fields": { "point": [ 1.0, 2.0, 3.0 ] } }, { "id": "index:neighbors_content/0/cfcd20845b10b1420c6cdeca", "relevance": 0.37364941905256455, "source": "neighbors_content", "fields": { "point": [ 0.0, 1.0, 2.0 ] } }, { "id": "index:neighbors_content/0/c81e728dfde15fa4e8dfb3d3", "relevance": 0.35821144946644456, "source": "neighbors_content", "fields": { "point": [ 2.0, 3.0, 4.0 ] } } ] ``` ### Nearest neighbor - angular[¶](#nearest-neighbor-angular) So far, we have used the default [distance-metric](https://docs.vespa.ai/en/nearest-neighbor-search.html#distance-metrics-for-nearest-neighbor-search) which is euclidean - now try with another. Add new few field with "angular" distance metric: In \[20\]: Copied! ``` app_package.schema.add_fields( Field( name="point_angular", type="tensor(d[3])", indexing=["attribute", "summary"], attribute=["distance-metric: angular"], ) ) app_package.schema.add_rank_profile( RankProfile( name="nearest_neighbor_angular", inputs=[("query(qpoint)", "tensor(d[3])")], first_phase="closeness(field, point_angular)", ) ) app = vespa_docker.deploy(application_package=app_package) ``` app_package.schema.add_fields( Field( name="point_angular", type="tensor(d[3])", indexing=["attribute", "summary"], attribute=["distance-metric: angular"], ) ) app_package.schema.add_rank_profile( RankProfile( name="nearest_neighbor_angular", inputs=\[("query(qpoint)", "tensor(d[3])")\], first_phase="closeness(field, point_angular)", ) ) app = vespa_docker.deploy(application_package=app_package) ``` Waiting for configuration server, 0/300 seconds... Waiting for configuration server, 5/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 0/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 5/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Waiting for application status, 10/300 seconds... Using plain http against endpoint http://localhost:8080/ApplicationStatus Application is up! Finished deployment. ``` Feed the same data to the `point_angular` field: In \[21\]: Copied! ``` for u in get_feed("point_angular"): response: VespaResponse = session.update_data( data_id=u["id"], schema="neighbors", fields=u["fields"] ) if not response.is_successful(): print( "Update failed for document {}".format(u["id"]) + " with status code {}".format(response.status_code) + " with response {}".format(response.get_json()) ) ``` for u in get_feed("point_angular"): response: VespaResponse = session.update_data( data_id=u["id"], schema="neighbors", fields=u["fields"] ) if not response.is_successful(): print( "Update failed for document {}".format(u["id"]) - " with status code {}".format(response.status_code) - " with response {}".format(response.get_json()) ) Observe the documents now have *two* vectors Notice that we pass [native Vespa document v1 api parameters](https://docs.vespa.ai/en/reference/document-v1-api-reference.html) to reduce the tensor verbosity. In \[24\]: Copied! ``` from vespa.io import VespaResponse response: VespaResponse = app.get_data( schema="neighbors", data_id=0, **{"format.tensors": "short-value"} ) print(json.dumps(response.get_json(), indent=4)) ``` from vespa.io import VespaResponse response: VespaResponse = app.get_data( schema="neighbors", data_id=0, \*\*{"format.tensors": "short-value"} ) print(json.dumps(response.get_json(), indent=4)) ``` { "pathId": "/document/v1/neighbors/neighbors/docid/0", "id": "id:neighbors:neighbors::0", "fields": { "point": [ 0.0, 1.0, 2.0 ], "point_angular": [ 0.0, 1.0, 2.0 ] } } ``` In \[25\]: Copied! ``` result = app.query( body={ "yql": "select point_angular from neighbors where {targetHits: 3}nearestNeighbor(point_angular, qpoint)", "input.query(qpoint)": "[1.0, 2.0, 2.9]", "ranking.profile": "nearest_neighbor_angular", "presentation.format.tensors": "short-value", } ) print(json.dumps(result.hits, indent=4)) ``` result = app.query( body={ "yql": "select point_angular from neighbors where {targetHits: 3}nearestNeighbor(point_angular, qpoint)", "input.query(qpoint)": "[1.0, 2.0, 2.9]", "ranking.profile": "nearest_neighbor_angular", "presentation.format.tensors": "short-value", } ) print(json.dumps(result.hits, indent=4)) ``` [ { "id": "index:neighbors_content/0/c4ca4238db266f395150e961", "relevance": 0.983943389010042, "source": "neighbors_content", "fields": { "point_angular": [ 1.0, 2.0, 3.0 ] } }, { "id": "index:neighbors_content/0/c81e728dfde15fa4e8dfb3d3", "relevance": 0.9004871017951954, "source": "neighbors_content", "fields": { "point_angular": [ 2.0, 3.0, 4.0 ] } }, { "id": "index:neighbors_content/0/cfcd20845b10b1420c6cdeca", "relevance": 0.7638041096953281, "source": "neighbors_content", "fields": { "point_angular": [ 0.0, 1.0, 2.0 ] } } ] ``` In the output above, observe the different in "relevance", compared to the query using `'ranking.profile': 'nearest_neighbor'` above - this is the difference in `closeness()` using different distance metrics. ## Next steps[¶](#next-steps) - Try the [multi-vector-indexing](https://vespa-engine.github.io/pyvespa/examples/multi-vector-indexing.md) notebook to explore using an HNSW-index for *approximate* nearest neighbor search. - Explore using the [distance()]() rank feature - this should give the same results as the ranking expressions using `euclidean-distance` above. - `label` is useful when having more vector fields - read more about the [nearestNeighbor](https://docs.vespa.ai/en/reference/query-language-reference.html#nearestneighbor) query operator. ## Cleanup[¶](#cleanup) In \[ \]: Copied! ``` vespa_docker.container.stop() vespa_docker.container.remove() ``` vespa_docker.container.stop() vespa_docker.container.remove()