# Visual PDF RAG with Vespa - ColPali demo application[¶](#visual-pdf-rag-with-vespa-colpali-demo-application) We created an end-to-end demo application for visual retrieval of PDF pages using Vespa, including a frontend web application. To see the live demo, visit . The main goal of the demo is to make it easy for *you* to create your own PDF Enterprise Search application using Vespa. To deploy a full demo, you need two main components: 1. A Vespa application that lets you index and search PDF pages using ColPali embeddings. 1. A live web application that lets you interact with the Vespa application. After running this notebook, you will have set up a Vespa application, and indexed some PDF pages. You can then test that you are able to query the Vespa application, and you will be ready to deploy the web application including the frontend. Some of the features we want to highlight in this demo are: - Visual retrieval of PDF pages using ColPali embeddings - Explainability by displaying similarity maps over the patches in the PDF pages for each query token. - Extracting queries and questions from the PDF pages using `gemini-1.5-8b` model. - Type-ahead search suggestions based on the extracted queries and questions. - Comparison of different retrieval and ranking strategies (BM25, ColPali MaxSim, and a combination of both). - AI-generated responses to the query based on the top ranked PDF pages. Also using the `gemini-1.5-8b` model. We also wanted to give a notion of which latency one can expect using Vespa for this use case. Event though your users might not state this explicitly, we consider it important to provide a snappy user experience. In this notebook, we will prepare the Vespa backend application for our visual retrieval demo. We will use ColPali as the model to extract patch vectors from images of pdf pages. At query time, we use MaxSim to retrieve and/or (based on the configuration) rank the page results. The steps we will take in this notebook are: 1. Setup and configuration 1. Download PDFs 1. Convert PDFs to images 1. Generate queries and questions 1. Generate ColPali embeddings 1. Prepare the Vespa application package 1. Deploy the Vespa application to Vespa Cloud 1. Feed the data to the Vespa application 1. Test a query to the Vespa application All the steps that are needed to provision the Vespa application, including feeding the data, can be done by running this notebook. We have tried to make it easy for others to run this notebook, to create your own PDF Enterprise Search application using Vespa. If you want to run this notebook in Colab, you can do so by clicking the button below: ## 1. Setup and Configuration[¶](#1-setup-and-configuration) In \[ \]: Copied! ``` !python --version ``` !python --version Install dependencies: Note that the python pdf2image package requires poppler-utils, see other installation options [here](https://pdf2image.readthedocs.io/en/latest/installation.html#installing-poppler). In \[ \]: Copied! ``` !sudo apt-get update && sudo apt-get install poppler-utils -y ``` !sudo apt-get update && sudo apt-get install poppler-utils -y Now install the required python packages: In \[ \]: Copied! ``` !pip3 install colpali-engine==0.3.10 pdf2image pypdf==5.0.1 pyvespa>=0.50.0 vespacli numpy==1.26.4 pillow==10.4.0 google-generativeai==0.8.3 transformers python-dotenv ``` !pip3 install colpali-engine==0.3.10 pdf2image pypdf==5.0.1 pyvespa>=0.50.0 vespacli numpy==1.26.4 pillow==10.4.0 google-generativeai==0.8.3 transformers python-dotenv In \[ \]: Copied! ``` import os import json from typing import Tuple import hashlib import numpy as np # Vespa from vespa.package import ( ApplicationPackage, Field, Schema, Document, HNSW, RankProfile, Function, FieldSet, SecondPhaseRanking, Summary, DocumentSummary, ) from vespa.deployment import VespaCloud from vespa.application import Vespa from vespa.io import VespaResponse # Google Generative AI for Google Gemini interaction import google.generativeai as genai # Torch and other ML libraries import torch from torch.utils.data import DataLoader from tqdm import tqdm from pdf2image import convert_from_path from pypdf import PdfReader # ColPali model and processor from colpali_engine.models import ColPali, ColPaliProcessor from colpali_engine.utils.torch_utils import get_torch_device # Load environment variables from dotenv import load_dotenv load_dotenv() # Avoid warning from huggingface tokenizers os.environ["TOKENIZERS_PARALLELISM"] = "false" ``` import os import json from typing import Tuple import hashlib import numpy as np # Vespa from vespa.package import ( ApplicationPackage, Field, Schema, Document, HNSW, RankProfile, Function, FieldSet, SecondPhaseRanking, Summary, DocumentSummary, ) from vespa.deployment import VespaCloud from vespa.application import Vespa from vespa.io import VespaResponse # Google Generative AI for Google Gemini interaction import google.generativeai as genai # Torch and other ML libraries import torch from torch.utils.data import DataLoader from tqdm import tqdm from pdf2image import convert_from_path from pypdf import PdfReader # ColPali model and processor from colpali_engine.models import ColPali, ColPaliProcessor from colpali_engine.utils.torch_utils import get_torch_device # Load environment variables from dotenv import load_dotenv load_dotenv() # Avoid warning from huggingface tokenizers os.environ["TOKENIZERS_PARALLELISM"] = "false" ### Create a free trial in Vespa Cloud[¶](#create-a-free-trial-in-vespa-cloud) Create a tenant from [here](https://vespa.ai/free-trial/). The trial includes $300 credit. Take note of your tenant name, and input it below. In \[ \]: Copied! ``` VESPA_TENANT_NAME = "vespa-team" # Replace with your tenant name ``` VESPA_TENANT_NAME = "vespa-team" # Replace with your tenant name Here, set your desired application name. (Will be created in later steps) Note that you can not have hyphen `-` or underscore `_` in the application name. In \[ \]: Copied! ``` VESPA_APPLICATION_NAME = "colpalidemodev" VESPA_SCHEMA_NAME = "pdf_page" ``` VESPA_APPLICATION_NAME = "colpalidemodev" VESPA_SCHEMA_NAME = "pdf_page" Next, you can to create a token. This is an optional authentication method (the default is mTLS), and will be used for feeding data, and querying the application. For details, see [Authenticating to Vespa Cloud](https://vespa-engine.github.io/pyvespa/authenticating-to-vespa-cloud.md). For now, we will use a single token with both read and write permissions. For production, we recommend separate tokens for feeding and querying, (the former with write permission, and the latter with read permission). The tokens can be created from the [Vespa Cloud console](https://console.vespa-cloud.com/) in the 'Account' -> 'Tokens' section. Please make sure to save the both the token id and it's value somwhere safe - you'll need it when you're going to connect to your app. In \[ \]: Copied! ``` # Replace this with the id of your token VESPA_TOKEN_ID = "pyvespa_integration" # This needs to match the token_id that you created in the Vespa Cloud Console ``` # Replace this with the id of your token VESPA_TOKEN_ID = "pyvespa_integration" # This needs to match the token_id that you created in the Vespa Cloud Console We also need to set the value of the write token to be able to feed data to the Vespa application (value of VESPA_TOKEN_ID_WRITE). Please run the cell below to set the variable. In \[ \]: Copied! ``` VESPA_CLOUD_SECRET_TOKEN = os.getenv("VESPA_CLOUD_SECRET_TOKEN") or input( "Enter Vespa cloud secret token: " ) ``` VESPA_CLOUD_SECRET_TOKEN = os.getenv("VESPA_CLOUD_SECRET_TOKEN") or input( "Enter Vespa cloud secret token: " ) We will use Google's Gemini API to create sample queries for our images. Create a Gemini API key from [here](https://aistudio.google.com/app/apikey). Once you have the key, please run the cell below. You can also use other VLM's to create these queries. In \[ \]: Copied! ``` GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY") or input( "Enter Google Generative AI API key: " ) # Configure Google Generative AI genai.configure(api_key=GOOGLE_API_KEY) ``` GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY") or input( "Enter Google Generative AI API key: " ) # Configure Google Generative AI genai.configure(api_key=GOOGLE_API_KEY) ### Loading the ColPali model from huggingface 🤗[¶](#loading-the-colpali-model-from-huggingface) In \[ \]: Copied! ``` MODEL_NAME = "vidore/colpali-v1.2" # Set device for Torch device = get_torch_device("auto") print(f"Using device: {device}") # Load the ColPali model and processor model = ColPali.from_pretrained( MODEL_NAME, torch_dtype=torch.float32, device_map=device, ).eval() processor = ColPaliProcessor.from_pretrained(MODEL_NAME) ``` MODEL_NAME = "vidore/colpali-v1.2" # Set device for Torch device = get_torch_device("auto") print(f"Using device: {device}") # Load the ColPali model and processor model = ColPali.from_pretrained( MODEL_NAME, torch_dtype=torch.float32, device_map=device, ).eval() processor = ColPaliProcessor.from_pretrained(MODEL_NAME) ## 2. Download PDFs[¶](#2-download-pdfs) We are going to use public reports from the Norwegian Government Pension Fund Global (also known as the Oil Fund). The fund puts transparency at the forefront and publishes reports on its investments, holdings, and returns, as well as its strategy and governance. These reports are the ones we are going to use for this showcase. Here are some sample images: As we can see, a lot of the information is in the form of tables, charts and numbers. These are not easily extractable using pdf-readers or OCR tools. In \[ \]: Copied! ``` import requests pdfs = [ { "url": "https://drive.google.com/uc?export=download&id=1nDO0KN_BjyFu42xFAfhJagOeeaJ8fhki", "path": "pdfs/gpfg-half-year-report-2024.pdf", "year": "2024", }, { "url": "https://drive.google.com/uc?export=download&id=1Saw_wM8RI6Zej5qkWDDpeM-3tyOQQTwR", "path": "pdfs/gpfg-annual-report_2023.pdf", "year": "2023", }, ] ``` import requests pdfs = [ { "url": "https://drive.google.com/uc?export=download&id=1nDO0KN_BjyFu42xFAfhJagOeeaJ8fhki", "path": "pdfs/gpfg-half-year-report-2024.pdf", "year": "2024", }, { "url": "https://drive.google.com/uc?export=download&id=1Saw_wM8RI6Zej5qkWDDpeM-3tyOQQTwR", "path": "pdfs/gpfg-annual-report_2023.pdf", "year": "2023", }, ] ### Downloading the PDFs[¶](#downloading-the-pdfs) We create a function to download the PDFs from the web to the provided directory. In \[ \]: Copied! ``` PDFS_DIR = "pdfs" os.makedirs(PDFS_DIR, exist_ok=True) def download_pdf(url: str, path: str): r = requests.get(url, stream=True) with open(path, "wb") as f: for chunk in r.iter_content(chunk_size=1024): if chunk: f.write(chunk) return path # Download the pdfs for pdf in pdfs: download_pdf(pdf["url"], pdf["path"]) ``` PDFS_DIR = "pdfs" os.makedirs(PDFS_DIR, exist_ok=True) def download_pdf(url: str, path: str): r = requests.get(url, stream=True) with open(path, "wb") as f: for chunk in r.iter_content(chunk_size=1024): if chunk: f.write(chunk) return path # Download the pdfs for pdf in pdfs: download_pdf(pdf["url"], pdf["path"]) ## 3. Convert PDFs to Images[¶](#3-convert-pdfs-to-images) In \[ \]: Copied! ``` def get_pdf_images(pdf_path): reader = PdfReader(pdf_path) page_texts = [] for page_number in range(len(reader.pages)): page = reader.pages[page_number] text = page.extract_text() page_texts.append(text) # Convert to PIL images images = convert_from_path(pdf_path) assert len(images) == len(page_texts) return images, page_texts pdf_folder = "pdfs" pdf_pages = [] for pdf in tqdm(pdfs): pdf_file = pdf["path"] title = os.path.splitext(os.path.basename(pdf_file))[0] images, texts = get_pdf_images(pdf_file) for page_no, (image, text) in enumerate(zip(images, texts)): pdf_pages.append( { "title": title, "year": pdf["year"], "url": pdf["url"], "path": pdf_file, "image": image, "text": text, "page_no": page_no, } ) ``` def get_pdf_images(pdf_path): reader = PdfReader(pdf_path) page_texts = [] for page_number in range(len(reader.pages)): page = reader.pages[page_number] text = page.extract_text() page_texts.append(text) # Convert to PIL images images = convert_from_path(pdf_path) assert len(images) == len(page_texts) return images, page_texts pdf_folder = "pdfs" pdf_pages = [] for pdf in tqdm(pdfs): pdf_file = pdf["path"] title = os.path.splitext(os.path.basename(pdf_file))[0] images, texts = get_pdf_images(pdf_file) for page_no, (image, text) in enumerate(zip(images, texts)): pdf_pages.append( { "title": title, "year": pdf["year"], "url": pdf["url"], "path": pdf_file, "image": image, "text": text, "page_no": page_no, } ) In \[ \]: Copied! ``` len(pdf_pages) ``` len(pdf_pages) In \[ \]: Copied! ``` MAX_PAGES = 10 # Set to None to use all pages pdf_pages = pdf_pages[:MAX_PAGES] if MAX_PAGES else pdf_pages ``` MAX_PAGES = 10 # Set to None to use all pages pdf_pages = pdf_pages[:MAX_PAGES] if MAX_PAGES else pdf_pages We now have 176 pages, which will be the entity we define as one document in Vespa. Let us look at the extracted text from the pages displayed above. In \[ \]: Copied! ``` pdf_pages[8]["image"] ``` pdf_pages[8]["image"] In \[ \]: Copied! ``` print(pdf_pages[8]["text"]) ``` print(pdf_pages[8]["text"]) In \[ \]: Copied! ``` # print(pdf_pages[95]["text"]) ``` # print(pdf_pages[95]["text"]) As we can see, the extracted text fails to capture the visual information we see in the image, and it would be difficult for an LLM to correctly answer questions such as *'Price development in Technology sector from April 2023?'* based on the text alone. ## 4. Generate Queries[¶](#4-generate-queries) In this step, we want to generate queries for each page image. These will be useful for 2 reasons: 1. We can use these queries as typeahead suggestions in the search bar. 1. We could potentially use the queries to generate an evaluation dataset. See [Improving Retrieval with LLM-as-a-judge](https://blog.vespa.ai/improving-retrieval-with-llm-as-a-judge/) for a deeper dive into this topic. This will not be within the scope of this notebook though. The prompt for generating queries is adapted from [this](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html#an-update-retrieval-focused-prompt) wonderful blog post by Daniel van Strien. We have modified the prompt to also generate keword based queries, in addition to the question based queries. We will use the Gemini API to generate these queries, with `gemini-flash-lite-latest` as the model. In \[ \]: Copied! ``` from pydantic import BaseModel class GeneratedQueries(BaseModel): broad_topical_question: str broad_topical_query: str specific_detail_question: str specific_detail_query: str visual_element_question: str visual_element_query: str def get_retrieval_prompt() -> Tuple[str, GeneratedQueries]: prompt = ( prompt ) = """You are an investor, stock analyst and financial expert. You will be presented an image of a document page from a report published by the Norwegian Government Pension Fund Global (GPFG). The report may be annual or quarterly reports, or policy reports, on topics such as responsible investment, risk etc. Your task is to generate retrieval queries and questions that you would use to retrieve this document (or ask based on this document) in a large corpus. Please generate 3 different types of retrieval queries and questions. A retrieval query is a keyword based query, made up of 2-5 words, that you would type into a search engine to find this document. A question is a natural language question that you would ask, for which the document contains the answer. The queries should be of the following types: 1. A broad topical query: This should cover the main subject of the document. 2. A specific detail query: This should cover a specific detail or aspect of the document. 3. A visual element query: This should cover a visual element of the document, such as a chart, graph, or image. Important guidelines: - Ensure the queries are relevant for retrieval tasks, not just describing the page content. - Use a fact-based natural language style for the questions. - Frame the queries as if someone is searching for this document in a large corpus. - Make the queries diverse and representative of different search strategies. Format your response as a JSON object with the structure of the following example: { "broad_topical_question": "What was the Responsible Investment Policy in 2019?", "broad_topical_query": "responsible investment policy 2019", "specific_detail_question": "What is the percentage of investments in renewable energy?", "specific_detail_query": "renewable energy investments percentage", "visual_element_question": "What is the trend of total holding value over time?", "visual_element_query": "total holding value trend" } If there are no relevant visual elements, provide an empty string for the visual element question and query. Here is the document image to analyze: Generate the queries based on this image and provide the response in the specified JSON format. Only return JSON. Don't return any extra explanation text. """ return prompt, GeneratedQueries prompt_text, pydantic_model = get_retrieval_prompt() ``` from pydantic import BaseModel class GeneratedQueries(BaseModel): broad_topical_question: str broad_topical_query: str specific_detail_question: str specific_detail_query: str visual_element_question: str visual_element_query: str def get_retrieval_prompt() -> Tuple\[str, GeneratedQueries\]: prompt = ( prompt ) = """You are an investor, stock analyst and financial expert. You will be presented an image of a document page from a report published by the Norwegian Government Pension Fund Global (GPFG). The report may be annual or quarterly reports, or policy reports, on topics such as responsible investment, risk etc. Your task is to generate retrieval queries and questions that you would use to retrieve this document (or ask based on this document) in a large corpus. Please generate 3 different types of retrieval queries and questions. A retrieval query is a keyword based query, made up of 2-5 words, that you would type into a search engine to find this document. A question is a natural language question that you would ask, for which the document contains the answer. The queries should be of the following types: 1. A broad topical query: This should cover the main subject of the document. 1. A specific detail query: This should cover a specific detail or aspect of the document. 1. A visual element query: This should cover a visual element of the document, such as a chart, graph, or image. Important guidelines: - Ensure the queries are relevant for retrieval tasks, not just describing the page content. - Use a fact-based natural language style for the questions. - Frame the queries as if someone is searching for this document in a large corpus. - Make the queries diverse and representative of different search strategies. Format your response as a JSON object with the structure of the following example: { "broad_topical_question": "What was the Responsible Investment Policy in 2019?", "broad_topical_query": "responsible investment policy 2019", "specific_detail_question": "What is the percentage of investments in renewable energy?", "specific_detail_query": "renewable energy investments percentage", "visual_element_question": "What is the trend of total holding value over time?", "visual_element_query": "total holding value trend" } If there are no relevant visual elements, provide an empty string for the visual element question and query. Here is the document image to analyze: Generate the queries based on this image and provide the response in the specified JSON format. Only return JSON. Don't return any extra explanation text. """ return prompt, GeneratedQueries prompt_text, pydantic_model = get_retrieval_prompt() In \[ \]: Copied! ``` gemini_model = genai.GenerativeModel("gemini-flash-lite-latest") def generate_queries(image, prompt_text, pydantic_model): try: response = gemini_model.generate_content( [image, "\n\n", prompt_text], generation_config=genai.GenerationConfig( response_mime_type="application/json", response_schema=pydantic_model, ), ) queries = json.loads(response.text) except Exception as _e: print(_e) queries = { "broad_topical_question": "", "broad_topical_query": "", "specific_detail_question": "", "specific_detail_query": "", "visual_element_question": "", "visual_element_query": "", } return queries ``` gemini_model = genai.GenerativeModel("gemini-flash-lite-latest") def generate_queries(image, prompt_text, pydantic_model): try: response = gemini_model.generate_content( [image, "\\n\\n", prompt_text], generation_config=genai.GenerationConfig( response_mime_type="application/json", response_schema=pydantic_model, ), ) queries = json.loads(response.text) except Exception as \_e: print(\_e) queries = { "broad_topical_question": "", "broad_topical_query": "", "specific_detail_question": "", "specific_detail_query": "", "visual_element_question": "", "visual_element_query": "", } return queries In \[ \]: Copied! ``` for pdf in tqdm(pdf_pages): image = pdf.get("image") pdf["queries"] = generate_queries(image, prompt_text, pydantic_model) ``` for pdf in tqdm(pdf_pages): image = pdf.get("image") pdf["queries"] = generate_queries(image, prompt_text, pydantic_model) Let's take a look at the queries and questions generated for the page displayed above. In \[ \]: Copied! ``` pdf_pages[8]["queries"] ``` pdf_pages[8]["queries"] ## 5. Generate embeddings[¶](#5-generate-embeddings) Now that we have the queries, we can use the ColPali model to generate embeddings for each page image. In \[ \]: Copied! ``` def generate_embeddings(images, model, processor, batch_size=1) -> np.ndarray: """ Generate embeddings for a list of images. Move to CPU only once per batch. Args: images (List[PIL.Image]): List of PIL images. model (nn.Module): The model to generate embeddings. processor: The processor to preprocess images. batch_size (int, optional): Batch size for processing. Defaults to 64. Returns: np.ndarray: Embeddings for the images, shape (len(images), processor.max_patch_length (1030 for ColPali), model.config.hidden_size (Patch embedding dimension - 128 for ColPali)). """ def collate_fn(batch): # Batch is a list of images return processor.process_images(batch) # Should return a dict of tensors dataloader = DataLoader( images, shuffle=False, collate_fn=collate_fn, ) embeddings_list = [] for batch in tqdm(dataloader): with torch.no_grad(): batch = {k: v.to(model.device) for k, v in batch.items()} embeddings_batch = model(**batch) # Convert tensor to numpy array and append to list embeddings_list.extend( [t.cpu().numpy() for t in torch.unbind(embeddings_batch)] ) # Stack all embeddings into a single numpy array all_embeddings = np.stack(embeddings_list, axis=0) return all_embeddings ``` def generate_embeddings(images, model, processor, batch_size=1) -> np.ndarray: """ Generate embeddings for a list of images. Move to CPU only once per batch. Args: images (List[PIL.Image]): List of PIL images. model (nn.Module): The model to generate embeddings. processor: The processor to preprocess images. batch_size (int, optional): Batch size for processing. Defaults to 64. Returns: np.ndarray: Embeddings for the images, shape (len(images), processor.max_patch_length (1030 for ColPali), model.config.hidden_size (Patch embedding dimension - 128 for ColPali)). """ def collate_fn(batch): # Batch is a list of images return processor.process_images(batch) # Should return a dict of tensors dataloader = DataLoader( images, shuffle=False, collate_fn=collate_fn, ) embeddings_list = [] for batch in tqdm(dataloader): with torch.no_grad(): batch = {k: v.to(model.device) for k, v in batch.items()} embeddings_batch = model(\*\*batch) # Convert tensor to numpy array and append to list embeddings_list.extend( [t.cpu().numpy() for t in torch.unbind(embeddings_batch)] ) # Stack all embeddings into a single numpy array all_embeddings = np.stack(embeddings_list, axis=0) return all_embeddings In \[ \]: Copied! ``` # Generate embeddings for all images images = [pdf["image"] for pdf in pdf_pages] embeddings = generate_embeddings(images, model, processor) ``` # Generate embeddings for all images images = \[pdf["image"] for pdf in pdf_pages\] embeddings = generate_embeddings(images, model, processor) Now, we have one embedding vector of dimension 128 for each patch of each image (1024 patches + some special tokens). In \[ \]: Copied! ``` embeddings.shape ``` embeddings.shape In \[ \]: Copied! ``` assert len(pdf_pages) == embeddings.shape[0] assert embeddings.shape[1] > 1028 # Number of patches (including special tokens) assert embeddings.shape[2] == 128 # Embedding dimension per patch ``` assert len(pdf_pages) == embeddings.shape[0] assert embeddings.shape[1] > 1028 # Number of patches (including special tokens) assert embeddings.shape[2] == 128 # Embedding dimension per patch ## 6. Prepare Data on Vespa Format[¶](#6-prepare-data-on-vespa-format) Now, that we have all the data we need, all that remains is to make sure it is in the right format for Vespa. We now convert the embeddings to Vespa JSON format so we can store (and index) them in Vespa. Details in [Vespa JSON feed format doc](https://docs.vespa.ai/en/reference/document-json-format.html). We use binary quantization (BQ) of the page level ColPali vector embeddings to reduce their size by 32x. Read more about binarization of multi-vector representations in the [colbert blog post](https://blog.vespa.ai/announcing-colbert-embedder-in-vespa/). The binarization step maps 128 dimensional floats to 128 bits, or 16 bytes per vector. Reducing the size by 32x. On the [DocVQA benchmark](https://huggingface.co/datasets/vidore/docvqa_test_subsampled), binarization results in only a small drop in ranking accuracy. In \[ \]: Copied! ``` def float_to_binary_embedding(float_query_embedding: dict) -> dict: """Utility function to convert float query embeddings to binary query embeddings.""" binary_query_embeddings = {} for k, v in float_query_embedding.items(): binary_vector = ( np.packbits(np.where(np.array(v) > 0, 1, 0)).astype(np.int8).tolist() ) binary_query_embeddings[k] = binary_vector return binary_query_embeddings ``` def float_to_binary_embedding(float_query_embedding: dict) -> dict: """Utility function to convert float query embeddings to binary query embeddings.""" binary_query_embeddings = {} for k, v in float_query_embedding.items(): binary_vector = ( np.packbits(np.where(np.array(v) > 0, 1, 0)).astype(np.int8).tolist() ) binary_query_embeddings[k] = binary_vector return binary_query_embeddings We also need a couple of image processing helper functions. These are borrowed from [vidore-benchmark](https://github.com/illuin-tech/vidore-benchmark/blob/v4.0.0/src/vidore_benchmark/utils/image_utils.py) repo. In \[ \]: Copied! ``` import base64 import io from pathlib import Path from typing import Union from PIL import Image def scale_image(image: Image.Image, new_height: int = 1024) -> Image.Image: """ Scale an image to a new height while maintaining the aspect ratio. """ # Calculate the scaling factor width, height = image.size aspect_ratio = width / height new_width = int(new_height * aspect_ratio) # Resize the image scaled_image = image.resize((new_width, new_height)) return scaled_image def get_base64_image(img: Union[str, Image.Image], add_url_prefix: bool = True) -> str: """ Convert an image (from a filepath or a PIL.Image object) to a JPEG-base64 string. """ if isinstance(img, str): img = Image.open(img) elif isinstance(img, Image.Image): pass else: raise ValueError("`img` must be a path to an image or a PIL Image object.") buffered = io.BytesIO() img.save(buffered, format="jpeg") b64_data = base64.b64encode(buffered.getvalue()).decode("utf-8") return f"data:image/jpeg;base64,{b64_data}" if add_url_prefix else b64_data ``` import base64 import io from pathlib import Path from typing import Union from PIL import Image def scale_image(image: Image.Image, new_height: int = 1024) -> Image.Image: """ Scale an image to a new height while maintaining the aspect ratio. """ # Calculate the scaling factor width, height = image.size aspect_ratio = width / height new_width = int(new_height * aspect_ratio) # Resize the image scaled_image = image.resize((new_width, new_height)) return scaled_image def get_base64_image(img: Union[str, Image.Image], add_url_prefix: bool = True) -> str: """ Convert an image (from a filepath or a PIL.Image object) to a JPEG-base64 string. """ if isinstance(img, str): img = Image.open(img) elif isinstance(img, Image.Image): pass else: raise ValueError("`img` must be a path to an image or a PIL Image object.") buffered = io.BytesIO() img.save(buffered, format="jpeg") b64_data = base64.b64encode(buffered.getvalue()).decode("utf-8") return f"data:image/jpeg;base64,{b64_data}" if add_url_prefix else b64_data Note that we also store a scaled down (blurred) version of the image in Vespa. The purpose of this is to return this fast on first results to the frontend, to provide a snappy user experience, and then load the full resolution image async in the background. In \[ \]: Copied! ``` vespa_feed = [] for pdf, embedding in zip(pdf_pages, embeddings): url = pdf["url"] year = pdf["year"] title = pdf["title"] image = pdf["image"] text = pdf.get("text", "") page_no = pdf["page_no"] query_dict = pdf["queries"] questions = [v for k, v in query_dict.items() if "question" in k and v] queries = [v for k, v in query_dict.items() if "query" in k and v] base_64_image = get_base64_image( scale_image(image, 32), add_url_prefix=False ) # Scaled down image to return fast on search (~1kb) base_64_full_image = get_base64_image(image, add_url_prefix=False) embedding_dict = {k: v for k, v in enumerate(embedding)} binary_embedding = float_to_binary_embedding(embedding_dict) # id_hash should be md5 hash of url and page_number id_hash = hashlib.md5(f"{url}_{page_no}".encode()).hexdigest() page = { "id": id_hash, "fields": { "id": id_hash, "url": url, "title": title, "year": year, "page_number": page_no, "blur_image": base_64_image, "full_image": base_64_full_image, "text": text, "embedding": binary_embedding, "queries": queries, "questions": questions, }, } vespa_feed.append(page) ``` vespa_feed = [] for pdf, embedding in zip(pdf_pages, embeddings): url = pdf["url"] year = pdf["year"] title = pdf["title"] image = pdf["image"] text = pdf.get("text", "") page_no = pdf["page_no"] query_dict = pdf["queries"] questions = [v for k, v in query_dict.items() if "question" in k and v] queries = [v for k, v in query_dict.items() if "query" in k and v] base_64_image = get_base64_image( scale_image(image, 32), add_url_prefix=False ) # Scaled down image to return fast on search (~1kb) base_64_full_image = get_base64_image(image, add_url_prefix=False) embedding_dict = {k: v for k, v in enumerate(embedding)} binary_embedding = float_to_binary_embedding(embedding_dict) # id_hash should be md5 hash of url and page_number id_hash = hashlib.md5(f"{url}\_{page_no}".encode()).hexdigest() page = { "id": id_hash, "fields": { "id": id_hash, "url": url, "title": title, "year": year, "page_number": page_no, "blur_image": base_64_image, "full_image": base_64_full_image, "text": text, "embedding": binary_embedding, "queries": queries, "questions": questions, }, } vespa_feed.append(page) ### [Optional] Saving the feed file[¶](#optional-saving-the-feed-file) If you have a large dataset, you can optionally save the file, and feed it using the Vespa CLI, which is more performant than the pyvespa client. See [Feeding to Vespa Cloud](https://vespa-engine.github.io/pyvespa/examples/feed_performance_cloud.md) for more details. Uncomment the cell below if you want to save the feed file. In \[ \]: Copied! ``` # os.makedirs("output", exist_ok=True) # with open("output/vespa_feed.jsonl", "w") as f: # vespa_feed_to_save = [] # for page in vespa_feed: # document_id = page["id"] # put_id = f"id:{VESPA_APPLICATION_NAME}:{VESPA_SCHEMA_NAME}::{document_id}" # vespa_feed_to_save.append({"put": put_id, "fields": page["fields"]}) # json.dump(vespa_feed_to_save, f) ``` # os.makedirs("output", exist_ok=True) # with open("output/vespa_feed.jsonl", "w") as f: # vespa_feed_to_save = [] # for page in vespa_feed: # document_id = page["id"] # put_id = f"id:{VESPA_APPLICATION_NAME}:{VESPA_SCHEMA_NAME}::{document_id}" # vespa_feed_to_save.append({"put": put_id, "fields": page["fields"]}) # json.dump(vespa_feed_to_save, f) ## 7. Prepare Vespa Application[¶](#7-prepare-vespa-application) ### Configuring the application package[¶](#configuring-the-application-package) [PyVespa](https://vespa-engine.github.io/pyvespa/) helps us build the [Vespa application package](https://docs.vespa.ai/en/application-packages.html). A Vespa application package consists of configuration files, schemas, models, and code (plugins). Here are some of the key components of this application package: 1. We store images (and a scaled down version of the image) as a `raw` field. 1. We store the binarized ColPali embeddings as a `tensor` field. 1. We store the queries and questions as a `array` field. 1. We define 3 different ranking profiles: - `default` Uses BM25 for first phase ranking and MaxSim for second phase ranking. - `bm25` Uses `bm25(title) + bm25(text)` (first phase only) for ranking. - `retrieval-and-rerank` Uses `nearestneighbor` of the query embedding over the document embeddings for retrieval, `binary_max_sim` for first phase ranking, and `max_sim` of the query-embeddings as float for second phase ranking. Vespa's [phased ranking](https://docs.vespa.ai/en/phased-ranking.html) allows us to use different ranking strategies for retrieval and reranking, to choose attractive trade-offs between latency, cost, and accuracy. 1. We also calculate dot product between the query and each document, so that it can be returned with the results, to generate the similarity maps, which show which patches of the image is most similar to the query token embeddings. First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html) with the fields we want to store and their type. In \[ \]: Copied! ``` colpali_schema = Schema( name=VESPA_SCHEMA_NAME, document=Document( fields=[ Field( name="id", type="string", indexing=["summary", "index"], match=["word"], ), Field(name="url", type="string", indexing=["summary", "index"]), Field(name="year", type="int", indexing=["summary", "attribute"]), Field( name="title", type="string", indexing=["summary", "index"], match=["text"], index="enable-bm25", ), Field(name="page_number", type="int", indexing=["summary", "attribute"]), Field(name="blur_image", type="raw", indexing=["summary"]), Field(name="full_image", type="raw", indexing=["summary"]), Field( name="text", type="string", indexing=["summary", "index"], match=["text"], index="enable-bm25", ), Field( name="embedding", type="tensor(patch{}, v[16])", indexing=[ "attribute", "index", ], ann=HNSW( distance_metric="hamming", max_links_per_node=32, neighbors_to_explore_at_insert=400, ), ), Field( name="questions", type="array", indexing=["summary", "attribute"], summary=Summary(fields=["matched-elements-only"]), ), Field( name="queries", type="array", indexing=["summary", "attribute"], summary=Summary(fields=["matched-elements-only"]), ), ] ), fieldsets=[ FieldSet( name="default", fields=["title", "text"], ), ], document_summaries=[ DocumentSummary( name="default", summary_fields=[ Summary( name="text", fields=[("bolding", "on")], ), Summary( name="snippet", fields=[("source", "text"), "dynamic"], ), ], from_disk=True, ), DocumentSummary( name="suggestions", summary_fields=[ Summary(name="questions"), ], from_disk=True, ), ], ) # Define similarity functions used in all rank profiles mapfunctions = [ Function( name="similarities", # computes similarity scores between each query token and image patch expression=""" sum( query(qt) * unpack_bits(attribute(embedding)), v ) """, ), Function( name="normalized", # normalizes the similarity scores to [-1, 1] expression=""" (similarities - reduce(similarities, min)) / (reduce((similarities - reduce(similarities, min)), max)) * 2 - 1 """, ), Function( name="quantized", # quantizes the normalized similarity scores to signed 8-bit integers [-128, 127] expression=""" cell_cast(normalized * 127.999, int8) """, ), ] # Define the 'bm25' rank profile bm25 = RankProfile( name="bm25", inputs=[("query(qt)", "tensor(querytoken{}, v[128])")], first_phase="bm25(title) + bm25(text)", functions=mapfunctions, ) # A function to create an inherited rank profile which also returns quantized similarity scores def with_quantized_similarity(rank_profile: RankProfile) -> RankProfile: return RankProfile( name=f"{rank_profile.name}_sim", first_phase=rank_profile.first_phase, inherits=rank_profile.name, summary_features=["quantized"], ) colpali_schema.add_rank_profile(bm25) colpali_schema.add_rank_profile(with_quantized_similarity(bm25)) # Update the 'colpali' rank profile input_query_tensors = [] MAX_QUERY_TERMS = 64 for i in range(MAX_QUERY_TERMS): input_query_tensors.append((f"query(rq{i})", "tensor(v[16])")) input_query_tensors.extend( [ ("query(qt)", "tensor(querytoken{}, v[128])"), ("query(qtb)", "tensor(querytoken{}, v[16])"), ] ) colpali = RankProfile( name="colpali", inputs=input_query_tensors, first_phase="max_sim_binary", second_phase=SecondPhaseRanking(expression="max_sim", rerank_count=10), functions=mapfunctions + [ Function( name="max_sim", expression=""" sum( reduce( sum( query(qt) * unpack_bits(attribute(embedding)), v ), max, patch ), querytoken ) """, ), Function( name="max_sim_binary", expression=""" sum( reduce( 1 / (1 + sum( hamming(query(qtb), attribute(embedding)), v) ), max, patch ), querytoken ) """, ), ], ) colpali_schema.add_rank_profile(colpali) colpali_schema.add_rank_profile(with_quantized_similarity(colpali)) # Update the 'hybrid' rank profile hybrid = RankProfile( name="hybrid", inputs=input_query_tensors, first_phase="max_sim_binary", second_phase=SecondPhaseRanking( expression="max_sim + 2 * (bm25(text) + bm25(title))", rerank_count=10 ), functions=mapfunctions + [ Function( name="max_sim", expression=""" sum( reduce( sum( query(qt) * unpack_bits(attribute(embedding)), v ), max, patch ), querytoken ) """, ), Function( name="max_sim_binary", expression=""" sum( reduce( 1 / (1 + sum( hamming(query(qtb), attribute(embedding)), v) ), max, patch ), querytoken ) """, ), ], ) colpali_schema.add_rank_profile(hybrid) colpali_schema.add_rank_profile(with_quantized_similarity(hybrid)) ``` colpali_schema = Schema( name=VESPA_SCHEMA_NAME, document=Document( fields=\[ Field( name="id", type="string", indexing=["summary", "index"], match=["word"], ), Field(name="url", type="string", indexing=["summary", "index"]), Field(name="year", type="int", indexing=["summary", "attribute"]), Field( name="title", type="string", indexing=["summary", "index"], match=["text"], index="enable-bm25", ), Field(name="page_number", type="int", indexing=["summary", "attribute"]), Field(name="blur_image", type="raw", indexing=["summary"]), Field(name="full_image", type="raw", indexing=["summary"]), Field( name="text", type="string", indexing=["summary", "index"], match=["text"], index="enable-bm25", ), Field( name="embedding", type="tensor(patch{}, v[16])", indexing=[ "attribute", "index", ], ann=HNSW( distance_metric="hamming", max_links_per_node=32, neighbors_to_explore_at_insert=400, ), ), Field( name="questions", type="array", indexing=["summary", "attribute"], summary=Summary(fields=["matched-elements-only"]), ), Field( name="queries", type="array", indexing=["summary", "attribute"], summary=Summary(fields=["matched-elements-only"]), ), \] ), fieldsets=\[ FieldSet( name="default", fields=["title", "text"], ), \], document_summaries=\[ DocumentSummary( name="default", summary_fields=\[ Summary( name="text", fields=[("bolding", "on")], ), Summary( name="snippet", fields=[("source", "text"), "dynamic"], ), \], from_disk=True, ), DocumentSummary( name="suggestions", summary_fields=[ Summary(name="questions"), ], from_disk=True, ), \], ) # Define similarity functions used in all rank profiles mapfunctions = \[ Function( name="similarities", # computes similarity scores between each query token and image patch expression=""" sum( query(qt) * unpack_bits(attribute(embedding)), v ) """, ), Function( name="normalized", # normalizes the similarity scores to [-1, 1] expression=""" (similarities - reduce(similarities, min)) / (reduce((similarities - reduce(similarities, min)), max)) * 2 - 1 """, ), Function( name="quantized", # quantizes the normalized similarity scores to signed 8-bit integers [-128, 127] expression=""" cell_cast(normalized * 127.999, int8) """, ), \] # Define the 'bm25' rank profile bm25 = RankProfile( name="bm25", inputs=\[("query(qt)", "tensor(querytoken{}, v[128])")\], first_phase="bm25(title) + bm25(text)", functions=mapfunctions, ) # A function to create an inherited rank profile which also returns quantized similarity scores def with_quantized_similarity(rank_profile: RankProfile) -> RankProfile: return RankProfile( name=f"{rank_profile.name}\_sim", first_phase=rank_profile.first_phase, inherits=rank_profile.name, summary_features=["quantized"], ) colpali_schema.add_rank_profile(bm25) colpali_schema.add_rank_profile(with_quantized_similarity(bm25)) # Update the 'colpali' rank profile input_query_tensors = [] MAX_QUERY_TERMS = 64 for i in range(MAX_QUERY_TERMS): input_query_tensors.append((f"query(rq{i})", "tensor(v[16])")) input_query_tensors.extend( \[ ("query(qt)", "tensor(querytoken{}, v[128])"), ("query(qtb)", "tensor(querytoken{}, v[16])"), \] ) colpali = RankProfile( name="colpali", inputs=input_query_tensors, first_phase="max_sim_binary", second_phase=SecondPhaseRanking(expression="max_sim", rerank_count=10), functions=mapfunctions - [ Function( name="max_sim", expression=""" sum( reduce( sum( query(qt) * unpack_bits(attribute(embedding)), v ), max, patch ), querytoken ) """, ), Function( name="max_sim_binary", expression=""" sum( reduce( 1 / (1 + sum( hamming(query(qtb), attribute(embedding)), v) ), max, patch ), querytoken ) """, ), ], ) colpali_schema.add_rank_profile(colpali) colpali_schema.add_rank_profile(with_quantized_similarity(colpali)) # Update the 'hybrid' rank profile hybrid = RankProfile( name="hybrid", inputs=input_query_tensors, first_phase="max_sim_binary", second_phase=SecondPhaseRanking( expression="max_sim + 2 * (bm25(text) + bm25(title))", rerank_count=10 ), functions=mapfunctions - [ Function( name="max_sim", expression=""" sum( reduce( sum( query(qt) * unpack_bits(attribute(embedding)), v ), max, patch ), querytoken ) """, ), Function( name="max_sim_binary", expression=""" sum( reduce( 1 / (1 + sum( hamming(query(qtb), attribute(embedding)), v) ), max, patch ), querytoken ) """, ), ], ) colpali_schema.add_rank_profile(hybrid) colpali_schema.add_rank_profile(with_quantized_similarity(hybrid)) ### Configuring the `services.xml`[¶](#configuring-the-servicesxml) [services.xml](https://docs.vespa.ai/en/reference/services.html) is the primary configuration file for a Vespa application, with a plethora of options to configure the application. Since `pyvespa` version `0.50.0`, these configuration options are also available in `pyvespa`. See [Pyvespa - Advanced configuration](https://vespa-engine.github.io/pyvespa/advanced-configuration.md) for more details. (Note that configurating this is optional, and pyvespa will use basic defaults for you if you opt out). We will use the advanced configuration to configure up [dynamic snippets](https://docs.vespa.ai/en/document-summaries.html#dynamic-snippets). This allows us to highlight matched terms in the search results and generate a `snippet` to display, rather than the full text of the document. In \[ \]: Copied! ``` from vespa.configuration.services import ( services, container, search, document_api, document_processing, clients, client, config, content, redundancy, documents, node, certificate, token, document, nodes, ) from vespa.configuration.vt import vt from vespa.package import ServicesConfiguration service_config = ServicesConfiguration( application_name=VESPA_APPLICATION_NAME, services_config=services( container( search(), document_api(), document_processing(), clients( client( certificate(file="security/clients.pem"), id="mtls", permissions="read,write", ), client( token(id=f"{VESPA_TOKEN_ID}"), id="token_write", permissions="read,write", ), ), config( vt("tag")( vt("bold")( vt("open", ""), vt("close", ""), ), vt("separator", "..."), ), name="container.qr-searchers", ), id=f"{VESPA_APPLICATION_NAME}_container", version="1.0", ), content( redundancy("1"), documents(document(type="pdf_page", mode="index")), nodes(node(distribution_key="0", hostalias="node1")), config( vt("max_matches", "2", replace_underscores=False), vt("length", "1000"), vt("surround_max", "500", replace_underscores=False), vt("min_length", "300", replace_underscores=False), name="vespa.config.search.summary.juniperrc", ), id=f"{VESPA_APPLICATION_NAME}_content", version="1.0", ), version="1.0", ), ) ``` from vespa.configuration.services import ( services, container, search, document_api, document_processing, clients, client, config, content, redundancy, documents, node, certificate, token, document, nodes, ) from vespa.configuration.vt import vt from vespa.package import ServicesConfiguration service_config = ServicesConfiguration( application_name=VESPA_APPLICATION_NAME, services_config=services( container( search(), document_api(), document_processing(), clients( client( certificate(file="security/clients.pem"), id="mtls", permissions="read,write", ), client( token(id=f"{VESPA_TOKEN_ID}"), id="token_write", permissions="read,write", ), ), config( vt("tag")( vt("bold")( vt("open", ""), vt("close", ""), ), vt("separator", "..."), ), name="container.qr-searchers", ), id=f"{VESPA_APPLICATION_NAME}\_container", version="1.0", ), content( redundancy("1"), documents(document(type="pdf_page", mode="index")), nodes(node(distribution_key="0", hostalias="node1")), config( vt("max_matches", "2", replace_underscores=False), vt("length", "1000"), vt("surround_max", "500", replace_underscores=False), vt("min_length", "300", replace_underscores=False), name="vespa.config.search.summary.juniperrc", ), id=f"{VESPA_APPLICATION_NAME}\_content", version="1.0", ), version="1.0", ), ) In \[ \]: Copied! ``` # Create the Vespa application package vespa_application_package = ApplicationPackage( name=VESPA_APPLICATION_NAME, schema=[colpali_schema], services_config=service_config, ) ``` # Create the Vespa application package vespa_application_package = ApplicationPackage( name=VESPA_APPLICATION_NAME, schema=[colpali_schema], services_config=service_config, ) ## 8. Deploy Vespa Application[¶](#8-deploy-vespa-application) In \[ \]: Copied! ``` # This is only needed for CI. VESPA_TEAM_API_KEY = os.getenv("VESPA_TEAM_API_KEY", None) ``` # This is only needed for CI. VESPA_TEAM_API_KEY = os.getenv("VESPA_TEAM_API_KEY", None) In \[ \]: Copied! ``` vespa_cloud = VespaCloud( tenant=VESPA_TENANT_NAME, application=VESPA_APPLICATION_NAME, key_content=VESPA_TEAM_API_KEY, application_package=vespa_application_package, ) # Deploy the application vespa_cloud.deploy() # Output the endpoint URL endpoint_url = vespa_cloud.get_token_endpoint() print(f"Application deployed. Token endpoint URL: {endpoint_url}") ``` vespa_cloud = VespaCloud( tenant=VESPA_TENANT_NAME, application=VESPA_APPLICATION_NAME, key_content=VESPA_TEAM_API_KEY, application_package=vespa_application_package, ) # Deploy the application vespa_cloud.deploy() # Output the endpoint URL endpoint_url = vespa_cloud.get_token_endpoint() print(f"Application deployed. Token endpoint URL: {endpoint_url}") Make sure to take note of the token endpoint_url. You need to put this in your `.env` file for your web application - `VESPA_APP_TOKEN_URL=https://abcd.vespa-app.cloud` - to access the Vespa application from your web application. ## 9. Feed Data to Vespa[¶](#9-feed-data-to-vespa) We will need the `enpdoint_url` and `colpalidemo_write` token to feed the data to the Vespa application. In \[ \]: Copied! ``` # Instantiate Vespa connection using token app = Vespa(url=endpoint_url, vespa_cloud_secret_token=VESPA_CLOUD_SECRET_TOKEN) app.get_application_status() ``` # Instantiate Vespa connection using token app = Vespa(url=endpoint_url, vespa_cloud_secret_token=VESPA_CLOUD_SECRET_TOKEN) app.get_application_status() Now, let us feed the data to Vespa. If you have a large dataset, you could also do this async, with `feed_async_iterable()`, see [Feeding Vespa cloud](https://vespa-engine.github.io/pyvespa/examples/feed_performance_cloud.md) for a detailed comparison. In \[ \]: Copied! ``` def callback(response: VespaResponse, id: str): if not response.is_successful(): print( f"Failed to feed document {id} with status code {response.status_code}: Reason {response.get_json()}" ) # Feed data into Vespa synchronously app.feed_iterable(vespa_feed, schema=VESPA_SCHEMA_NAME, callback=callback) ``` def callback(response: VespaResponse, id: str): if not response.is_successful(): print( f"Failed to feed document {id} with status code {response.status_code}: Reason {response.get_json()}" ) # Feed data into Vespa synchronously app.feed_iterable(vespa_feed, schema=VESPA_SCHEMA_NAME, callback=callback) ## 10. Test a query to the Vespa application[¶](#10-test-a-query-to-the-vespa-application) For now, we will just run a query with the default rank profile. We will need a utility function to generate embeddings for the query, and pass this to Vespa to use for calculating MaxSim. In the web application, we also provide function to generate binary embeddings, allowing the user to choose different rank profiles at query time. In \[ \]: Copied! ``` query = "Price development in Technology sector from April 2023?" ``` query = "Price development in Technology sector from April 2023?" In \[ \]: Copied! ``` def get_q_embs_vespa_format(query: str): inputs = processor.process_queries([query]).to(model.device) with torch.no_grad(): embeddings_query = model(**inputs) q_embs = embeddings_query.to("cpu")[0] # Extract the single embedding return {idx: emb.tolist() for idx, emb in enumerate(q_embs)} ``` def get_q_embs_vespa_format(query: str): inputs = processor.process_queries([query]).to(model.device) with torch.no_grad(): embeddings_query = model(\*\*inputs) q_embs = embeddings_query.to("cpu")[0] # Extract the single embedding return {idx: emb.tolist() for idx, emb in enumerate(q_embs)} In \[ \]: Copied! ``` q_emb = get_q_embs_vespa_format(query) ``` q_emb = get_q_embs_vespa_format(query) In \[ \]: Copied! ``` with app.syncio() as sess: response = sess.query( body={ "yql": ( f"select id, url, title, year, full_image, quantized from {VESPA_SCHEMA_NAME} where userQuery();" ), "ranking": "default", "query": query, "timeout": "10s", "hits": 3, "input.query(qt)": q_emb, "presentation.timing": True, } ) ``` with app.syncio() as sess: response = sess.query( body={ "yql": ( f"select id, url, title, year, full_image, quantized from {VESPA_SCHEMA_NAME} where userQuery();" ), "ranking": "default", "query": query, "timeout": "10s", "hits": 3, "input.query(qt)": q_emb, "presentation.timing": True, } ) In \[ \]: Copied! ``` assert len(response.json["root"]["children"]) == 3 ``` assert len(response.json["root"]["children"]) == 3 Great. You have now deployed the Vespa application and fed the data to it, and made sure you are able to query it using the vespa endpoint and a token. ### Saving the generated key/cert files[¶](#saving-the-generated-keycert-files) A key and cert file is generated for you as an alternative to using tokens for authentication. We advise you to save these files in a secure location, in case you want to use them for authentication in the future. In \[ \]: Copied! ``` key_path = Path( f"~/.vespa/{VESPA_TENANT_NAME}.{VESPA_APPLICATION_NAME}.default/data-plane-private-key.pem" ).expanduser() cert_path = Path( f"~/.vespa/{VESPA_TENANT_NAME}.{VESPA_APPLICATION_NAME}.default/data-plane-public-cert.pem" ).expanduser() assert key_path.exists(), cert_path.exists() ``` key_path = Path( f"~/.vespa/{VESPA_TENANT_NAME}.{VESPA_APPLICATION_NAME}.default/data-plane-private-key.pem" ).expanduser() cert_path = Path( f"~/.vespa/{VESPA_TENANT_NAME}.{VESPA_APPLICATION_NAME}.default/data-plane-public-cert.pem" ).expanduser() assert key_path.exists(), cert_path.exists() ## 11. Deploying your web app[¶](#11-deploying-your-web-app) To deploy a frontend to let users interact with the Vespa application. you can clone the sample app from [sample-apps repo](https://github.com/vespa-engine/sample-apps/blob/master/visual-retrieval-colpali/README.md). It includes instructions for running and connecting your web application to your vespa app. In \[ \]: Copied! ``` !git clone --depth 1 --filter=blob:none --sparse https://github.com/vespa-engine/sample-apps.git src && cd src && git sparse-checkout set visual-retrieval-colpali ``` !git clone --depth 1 --filter=blob:none --sparse https://github.com/vespa-engine/sample-apps.git src && cd src && git sparse-checkout set visual-retrieval-colpali Now, you have the code for the webapp in your `src/visual-retrieval-colpali`-directory In \[ \]: Copied! ``` os.listdir("src/visual-retrieval-colpali") ``` os.listdir("src/visual-retrieval-colpali") ### Setting environment variables for your web app[¶](#setting-environment-variables-for-your-web-app) Now, you need to set the following variables in the `src/.env.example`-file: ``` VESPA_APP_TOKEN_URL=https://abcde.z.vespa-app.cloud # Your token endpoint url you got after deploying your Vespa app. VESPA_CLOUD_SECRET_TOKEN=vespa_cloud_xxxxxxxx # The value of the token that your created in this notebook. GEMINI_API_KEY=your_api_key # The same as GOOGLE_API_KEY in this notebook HF_TOKEN=hf_xxxx # If you want to deploy your web app to huggingface spaces - https://huggingface.co/settings/tokens ``` After, that, rename your file to .env. In \[ \]: Copied! ``` # rename src/visual-retrieval-colpali/.env.example os.rename( "src/visual-retrieval-colpali/.env.example", dst="src/visual-retrieval-colpali/.env" ) ``` # rename src/visual-retrieval-colpali/.env.example os.rename( "src/visual-retrieval-colpali/.env.example", dst="src/visual-retrieval-colpali/.env" ) And you're ready to spin up your web app locally, and deploy to huggingface spaces if you want. Navigate to `src/visual-retrieval-colpali/` directory and follow the instructions in the `README.md` to continue. 🚀 ## Cleanup[¶](#cleanup) As this notebook runs in CI, we will delete the Vespa application after running the notebook. DO NOT run the cell below unless you are sure you want to delete the Vespa application. In \[ \]: Copied! ``` if os.getenv("CI", "false") == "true": vespa_cloud.delete() ``` if os.getenv("CI", "false") == "true": vespa_cloud.delete()