On-Device LLMs for Web Scraping and Advanced Web Queries using Jina Reader API + MediaPipe +…

Published on May 30, 2024

On-Device LLMs for Web Scraping and Advanced Web Queries using Jina Reader API + MediaPipe + Tensorflow Lite

As we continue to push the boundaries of AI accessibility with on-device LLMs, the integration of sophisticated tools like Jina API takes this evolution a step further. Jina API, an open-source neural search framework, excels in facilitating retrieval-augmented generation (RAG), which is a pivotal advancement in the AI landscape. RAG combines the power of data retrieval with generative AI, creating a system that not only understands but also contextualizes information to produce highly relevant and accurate responses. This integration is particularly beneficial in scenarios where real-time data utilization and contextual relevance are paramount. By leveraging Jina API, developers can enhance the capabilities of AI models, making them more precise and efficient, thus broadening the scope and impact of AI applications. This article explores how Jina API can be seamlessly integrated with on-device LLMs to redefine AI accessibility and functionality, ensuring that advanced AI interactions are not just a possibility but a reality for a wider audience.

Don't forget to read how we implemented On-Device LLM below

Cloud to Pocket — Redefining AI Accessibility: On-Device LLMs

What is Jina Reader API

In the world of artificial intelligence, the ability to process and understand natural language is a key goal. This is where Language Learning Models (LLMs) come into play. However, these models often face a challenge: they need to be grounded with the latest information from the web. This is where the Jina Reader API steps in.

The Jina Reader API is a tool designed to convert any URL into a format that is friendly for LLMs. It extracts the core content from a URL and converts it into clean, LLM-friendly text. This ensures high-quality input for your agent and RAG systems.

Key Features of the Jina Reader API

Reading from a URL: One of the primary features of the Jina Reader API is its ability to read from a URL. It extracts the core content from a URL and converts it into clean, LLM-friendly text. This ensures high-quality input for your agent and RAG systems.
Search Grounding: LLMs have a knowledge cut-off, meaning they can’t access the latest world knowledge. The Reader API allows you to ground your LLM with the latest information from the web. Simply prepend 5 to your query, and Reader will search the web and return the top five results with their URLs and contents, each in clean, LLM-friendly text.
Image Reading: Images on the webpage are automatically captioned using a vision language model in the reader and formatted as image alt tags in the output. This gives your downstream LLM just enough hints to incorporate those images into its reasoning and summarizing processes.
Free and Scalable: The Reader API is available for free and offers flexible rate limit and pricing. It’s built on a scalable infrastructure, offering high accessibility, concurrency, and reliability.

Implementation

MediaPipe GenAI tasks library offers powerful capabilities for developers seeking to harness Large Language Models (LLMs). This JavaScript code snippet exemplifies how to integrate MediaPipe’s LLM inference functionality into web applications, unlocking a realm of possibilities for text processing and understanding.

At the core of this script lies the ‘LlmInference’ class, which facilitates the execution of LLM models. By importing this class, along with ‘FilesetResolver’, from the ‘@mediapipe/tasks-genai’ package, developers gain access to a suite of tools for advanced text processing tasks. The script also demonstrates how to interact with the DOM, retrieving input and output elements to create a seamless user experience.

One notable feature of the script is its ability to fetch data from a specified URL and populate the input box. This functionality expands the script’s utility beyond static text inputs, enabling dynamic content retrieval for analysis and processing. Additionally, the ‘displayPartialResults’ function enhances user feedback by displaying partial results during the inference process, culminating in a complete response.

The ‘runDemo’ function serves as the central component, orchestrating the initialization of the LLM model and managing user interactions. Through careful configuration of options such as the model asset path (‘modelAssetPath’) and maximum tokens (‘maxTokens’), developers can tailor the LLM’s behavior to suit their application’s needs. In the event of initialization failure, the script provides informative alerts, ensuring a smooth user experience.

Find the complete code for the On-Device LLMs Gemma Jina Reader project at

GitHub - toniramchandani1/On-Device_LLMs_Gemma_Jina_Reader

Example

Here are the features an app could have to guide users, provide URL input, query LLMs, scrape data, and offer a Retrieval-Augmented Generation (RAG) solution for web queries.

How to Set Up

To set up and run the MediaPipe LLM Inference task for web applications, follow these steps:

1. Ensure your browser supports WebGPU, like Chrome on macOS or Windows.
2. Create a folder named llm_task.
3. Copy index.html and index.js files into your llm_task folder.
4. Download the Gemma 2B model from Gemma or convert an external LLM model (Phi-2, Falcon, or StableLM) into the llm_task folder, ensuring it’s compatible with a GPU backend.
5. In the index.js file, update the modelFileName variable to match your model file’s name.
6. Run a local server within the llm_task folder using the command python -m http.server 8080 or python -m SimpleHTTPServer 8080 for older Python versions.
7. Open localhost:8080 in your Chrome browser. The web interface will activate, ready for use in about 10 seconds.

Please find below the content for ‘index.html’ and ‘index.js’ respectively.




    
    
    
    https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH" crossorigin="anonymous">
    


    

        
        

            Toni Ramchandani

            Driven by Sports, Adventure, Technology & Innovations

            https://www.linkedin.com/in/toni-ramchandani/" class="profile-link">LinkedIn Profile
        

    

    

        

            

                Running Large Language Models On-Device with MediaPipe, JINA Reader API & TensorFlow Lite for Web Scraping and Advanced Web Queries

                

                

                    URL to Fetch:

                    

                    
                
  
                

                    Input:

                    

                    
                
 
                

                    Result:

import {FilesetResolver, LlmInference} from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai';

const input = document.getElementById('input');
const output = document.getElementById('output');
const submit = document.getElementById('submit');
const fetchButton = document.getElementById('fetch'); // Added fetch button

const modelFileName = 'gemma-2b-it-gpu-int4.bin'; 

/**
 * Display newly generated partial results to the output text box.
 */
function displayPartialResults(partialResults, complete) {
  output.textContent += partialResults;

  if (complete) {
    if (!output.textContent) {
      output.textContent = 'Result is empty';
    }
    submit.disabled = false;
  }
}

/**
 * Fetches data from the input URL and populates the input box.
 */
async function fetchData() {
    const urlInput = document.getElementById('urlInput').value;
    const base_url = "https://r.jina.ai/";
    const full_url = base_url + urlInput;
    const headers = new Headers({
        "Accept": "text/event-stream"
    });

    try {
        const response = await fetch(full_url, { headers: headers });
        if (!response.ok) throw new Error('Network response was not ok.');

        let reader = response.body.getReader();
        let decoder = new TextDecoder();
        let result = '';

        while (true) {
            const { done, value } = await reader.read();
            if (done) break;
            result += decoder.decode(value, { stream: true });
        }

        input.value = result;
        submit.disabled = false; // Enable the button if data is fetched successfully
    } catch (error) {
        console.error('Failed to fetch:', error);
        alert('Failed to fetch data: ' + error.message);
    }
}

fetchButton.onclick = () => {
    fetchData();
}

/**
 * Main function to run LLM Inference.
 */
async function runDemo() {
  const genaiFileset = await FilesetResolver.forGenAiTasks(
      'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai/wasm');
  let llmInference;

  submit.onclick = () => {
    output.textContent = '';
    submit.disabled = true;
    llmInference.generateResponse(input.value, displayPartialResults);
  };

  submit.value = 'Loading the model...'
  LlmInference
      .createFromOptions(genaiFileset, {
        baseOptions: {modelAssetPath: modelFileName},
        maxTokens: 2000, // Added maxTokens parameter
      })
      .then(llm => {
        llmInference = llm;
        submit.disabled = false;
        submit.value = 'Get Response'
      })
      .catch(() => {
        alert('Failed to initialize the task.');
      });
}

runDemo();

About Me🚀
Hello! I’m Toni Ramchandani 👋. I’m deeply passionate about all things technology! My journey is about exploring the vast and dynamic world of tech, from cutting-edge innovations to practical business solutions. I believe in the power of technology to transform our lives and work. 🌐

Let’s connect at https://www.linkedin.com/in/toni-ramchandani/ and exchange ideas about the latest tech trends and advancements! 🌟

Engage & Stay Connected 📢
If you find value in my posts, please Clapp 👏 | Like 👍 and share 📤 them. Your support inspires me to continue sharing insights and knowledge. Follow me for more updates and let’s explore the fascinating world of technology together! 🛰️

On-Device LLMs for Web Scraping and Advanced Web Queries using Jina Reader API + MediaPipe +… was originally published in Generative AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Continue reading on website

Other news

🌸 Spring bingo - Wellness challenge - Halfway! 🌸

April 15, 2025

Hey Hivebriters! Quick check-in on our April Wellness Challenge - Spring Bingo! We're halfway through the month, and it's the perfect time to jump in if you haven't started yet (or keep going if you have)! Quick Reminders:Complete rows or columns for 5 raffle entries eachSquares with 📷 require photo submissions in the commentsSubmit completed rows/columns through the form by April 30thBonus entri