Real-Time Camera Input for Image Recognition

- April 13, 2025

Imagine pointing your webcam at an object and instantly getting a prediction of what it is—just like Google Lens! In this guide, we’ll connect your browser's camera to a backend AI model built using TensorFlow or PyTorch, all in real-time.

🔧 Tech Stack

HTML5 + JavaScript – to access webcam and capture frames
Flask (Python) – to serve the model and process images
TensorFlow or PyTorch – for the image classification model

🎬 Step 1: HTML + JS for Webcam Input

Use the getUserMedia() API to stream webcam video, and capture frames as images.

<video id="video" width="480" height="360" autoplay></video>
<canvas id="canvas" width="480" height="360" style="display:none;"></canvas>
<br>
<button onclick="captureImage()">📷 Capture & Analyze</button>
<p id="result"></p>

<script>
  const video = document.getElementById('video');
  const canvas = document.getElementById('canvas');
  const context = canvas.getContext('2d');

  navigator.mediaDevices.getUserMedia({ video: true })
    .then(stream => { video.srcObject = stream; });

  function captureImage() {
    context.drawImage(video, 0, 0, canvas.width, canvas.height);
    canvas.toBlob(blob => {
      const formData = new FormData();
      formData.append('file', blob, 'frame.jpg');

      fetch('http://localhost:5000/predict', {
        method: 'POST',
        body: formData
      })
      .then(response => response.json())
      .then(data => {
        document.getElementById('result').innerText = "🔍 Prediction: " + data[0][1];
      });
    }, 'image/jpeg');
  }
</script>

🧠 Step 2: Flask Backend with AI Model

This Python backend uses ResNet50 to classify the uploaded frame.

from flask import Flask, request, jsonify
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np
import os

app = Flask(__name__)
model = ResNet50(weights='imagenet')

@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['file']
    file_path = 'temp.jpg'
    file.save(file_path)

    img = image.load_img(file_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    preds = model.predict(x)

    os.remove(file_path)
    return jsonify(decode_predictions(preds, top=3)[0])

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

🧪 Testing It Out

Start your Python server: python app.py
Open the HTML file in your browser (localhost or deploy)
Allow camera access and click Capture & Analyze
See the object prediction result instantly!

🎯 Use Cases

Retail – Identify product SKUs visually
Education – Real-time object identification
Healthcare – Medical object detection
Manufacturing – Real-time defect recognition

💡 Bonus Ideas

Use YOLO or MobileNet for object detection (not just classification)
Stream predictions continuously while the video plays
Build a full-stack app with Spring Boot as the orchestrator

Now your browser is a real-time AI-powered lens! 📸

Search This Blog

Java coding for you!!

Real-Time Camera Input for Image Recognition

🔧 Tech Stack

🎬 Step 1: HTML + JS for Webcam Input

🧠 Step 2: Flask Backend with AI Model

🧪 Testing It Out

🎯 Use Cases

💡 Bonus Ideas

Comments

Post a Comment

Popular posts from this blog

Spring Boot with AI

Voice & Chatbots – AI-Assisted Conversational Apps

Java 17 Features