
On-Device LLMs for Web Scraping and Advanced Web Queries using Jina Reader API + MediaPipe +…
On-Device LLMs for Web Scraping and Advanced Web Queries using Jina Reader API + MediaPipe + Tensorflow Lite
As we continue to push the boundaries of AI accessibility with on-device LLMs, the integration of sophisticated tools like Jina API takes this evolution a step further. Jina API, an open-source neural search framework, excels in facilitating retrieval-augmented generation (RAG), which is a pivotal advancement in the AI landscape. RAG combines the power of data retrieval with generative AI, creating a system that not only understands but also contextualizes information to produce highly relevant and accurate responses. This integration is particularly beneficial in scenarios where real-time data utilization and contextual relevance are paramount. By leveraging Jina API, developers can enhance the capabilities of AI models, making them more precise and efficient, thus broadening the scope and impact of AI applications. This article explores how Jina API can be seamlessly integrated with on-device LLMs to redefine AI accessibility and functionality, ensuring that advanced AI interactions are not just a possibility but a reality for a wider audience.
Don't forget to read how we implemented On-Device LLM below
Cloud to Pocket — Redefining AI Accessibility: On-Device LLMs
What is Jina Reader API
In the world of artificial intelligence, the ability to process and understand natural language is a key goal. This is where Language Learning Models (LLMs) come into play. However, these models often face a challenge: they need to be grounded with the latest information from the web. This is where the Jina Reader API steps in.

The Jina Reader API is a tool designed to convert any URL into a format that is friendly for LLMs. It extracts the core content from a URL and converts it into clean, LLM-friendly text. This ensures high-quality input for your agent and RAG systems.
Key Features of the Jina Reader API
- Reading from a URL: One of the primary features of the Jina Reader API is its ability to read from a URL. It extracts the core content from a URL and converts it into clean, LLM-friendly text. This ensures high-quality input for your agent and RAG systems.
- Search Grounding: LLMs have a knowledge cut-off, meaning they can’t access the latest world knowledge. The Reader API allows you to ground your LLM with the latest information from the web. Simply prepend 5 to your query, and Reader will search the web and return the top five results with their URLs and contents, each in clean, LLM-friendly text.
- Image Reading: Images on the webpage are automatically captioned using a vision language model in the reader and formatted as image alt tags in the output. This gives your downstream LLM just enough hints to incorporate those images into its reasoning and summarizing processes.
- Free and Scalable: The Reader API is available for free and offers flexible rate limit and pricing. It’s built on a scalable infrastructure, offering high accessibility, concurrency, and reliability.
Implementation
MediaPipe GenAI tasks library offers powerful capabilities for developers seeking to harness Large Language Models (LLMs). This JavaScript code snippet exemplifies how to integrate MediaPipe’s LLM inference functionality into web applications, unlocking a realm of possibilities for text processing and understanding.
At the core of this script lies the ‘LlmInference’ class, which facilitates the execution of LLM models. By importing this class, along with ‘FilesetResolver’, from the ‘@mediapipe/tasks-genai’ package, developers gain access to a suite of tools for advanced text processing tasks. The script also demonstrates how to interact with the DOM, retrieving input and output elements to create a seamless user experience.
One notable feature of the script is its ability to fetch data from a specified URL and populate the input box. This functionality expands the script’s utility beyond static text inputs, enabling dynamic content retrieval for analysis and processing. Additionally, the ‘displayPartialResults’ function enhances user feedback by displaying partial results during the inference process, culminating in a complete response.
The ‘runDemo’ function serves as the central component, orchestrating the initialization of the LLM model and managing user interactions. Through careful configuration of options such as the model asset path (‘modelAssetPath’) and maximum tokens (‘maxTokens’), developers can tailor the LLM’s behavior to suit their application’s needs. In the event of initialization failure, the script provides informative alerts, ensuring a smooth user experience.
Find the complete code for the On-Device LLMs Gemma Jina Reader project at
GitHub - toniramchandani1/On-Device_LLMs_Gemma_Jina_Reader
Example
Here are the features an app could have to guide users, provide URL input, query LLMs, scrape data, and offer a Retrieval-Augmented Generation (RAG) solution for web queries.

How to Set Up
To set up and run the MediaPipe LLM Inference task for web applications, follow these steps:
1. Ensure your browser supports WebGPU, like Chrome on macOS or Windows.
2. Create a folder named llm_task.
3. Copy index.html and index.js files into your llm_task folder.
4. Download the Gemma 2B model from Gemma or convert an external LLM model (Phi-2, Falcon, or StableLM) into the llm_task folder, ensuring it’s compatible with a GPU backend.
5. In the index.js file, update the modelFileName variable to match your model file’s name.
6. Run a local server within the llm_task folder using the command python -m http.server 8080 or python -m SimpleHTTPServer 8080 for older Python versions.
7. Open localhost:8080 in your Chrome browser. The web interface will activate, ready for use in about 10 seconds.
Please find below the content for ‘index.html’ and ‘index.js’ respectively.
https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH" crossorigin="anonymous">
Toni Ramchandani
Driven by Sports, Adventure, Technology & Innovations
https://www.linkedin.com/in/toni-ramchandani/" class="profile-link">LinkedIn Profile
Running Large Language Models On-Device with MediaPipe, JINA Reader API & TensorFlow Lite for Web Scraping and Advanced Web Queries
URL to Fetch:
Input:
Result:
import {FilesetResolver, LlmInference} from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai';
const input = document.getElementById('input');
const output = document.getElementById('output');
const submit = document.getElementById('submit');
const fetchButton = document.getElementById('fetch'); // Added fetch button
const modelFileName = 'gemma-2b-it-gpu-int4.bin';
/**
* Display newly generated partial results to the output text box.
*/
function displayPartialResults(partialResults, complete) {
output.textContent += partialResults;
if (complete) {
if (!output.textContent) {
output.textContent = 'Result is empty';
}
submit.disabled = false;
}
}
/**
* Fetches data from the input URL and populates the input box.
*/
async function fetchData() {
const urlInput = document.getElementById('urlInput').value;
const base_url = "https://r.jina.ai/";
const full_url = base_url + urlInput;
const headers = new Headers({
"Accept": "text/event-stream"
});
try {
const response = await fetch(full_url, { headers: headers });
if (!response.ok) throw new Error('Network response was not ok.');
let reader = response.body.getReader();
let decoder = new TextDecoder();
let result = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
result += decoder.decode(value, { stream: true });
}
input.value = result;
submit.disabled = false; // Enable the button if data is fetched successfully
} catch (error) {
console.error('Failed to fetch:', error);
alert('Failed to fetch data: ' + error.message);
}
}
fetchButton.onclick = () => {
fetchData();
}
/**
* Main function to run LLM Inference.
*/
async function runDemo() {
const genaiFileset = await FilesetResolver.forGenAiTasks(
'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai/wasm');
let llmInference;
submit.onclick = () => {
output.textContent = '';
submit.disabled = true;
llmInference.generateResponse(input.value, displayPartialResults);
};
submit.value = 'Loading the model...'
LlmInference
.createFromOptions(genaiFileset, {
baseOptions: {modelAssetPath: modelFileName},
maxTokens: 2000, // Added maxTokens parameter
})
.then(llm => {
llmInference = llm;
submit.disabled = false;
submit.value = 'Get Response'
})
.catch(() => {
alert('Failed to initialize the task.');
});
}
runDemo();
About Me🚀
Hello! I’m Toni Ramchandani 👋. I’m deeply passionate about all things technology! My journey is about exploring the vast and dynamic world of tech, from cutting-edge innovations to practical business solutions. I believe in the power of technology to transform our lives and work. 🌐
Let’s connect at https://www.linkedin.com/in/toni-ramchandani/ and exchange ideas about the latest tech trends and advancements! 🌟
Engage & Stay Connected 📢
If you find value in my posts, please Clapp 👏 | Like 👍 and share 📤 them. Your support inspires me to continue sharing insights and knowledge. Follow me for more updates and let’s explore the fascinating world of technology together! 🛰️
On-Device LLMs for Web Scraping and Advanced Web Queries using Jina Reader API + MediaPipe +… was originally published in Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.