Real-Time Camera Input for Image Recognition

Real-Time Camera Input for Image Recognition

Imagine pointing your webcam at an object and instantly getting a prediction of what it is—just like Google Lens! In this guide, we’ll connect your browser's camera to a backend AI model built using TensorFlow or PyTorch, all in real-time.

๐Ÿ”ง Tech Stack

  • HTML5 + JavaScript – to access webcam and capture frames
  • Flask (Python) – to serve the model and process images
  • TensorFlow or PyTorch – for the image classification model

๐ŸŽฌ Step 1: HTML + JS for Webcam Input

Use the getUserMedia() API to stream webcam video, and capture frames as images.

<video id="video" width="480" height="360" autoplay></video>
<canvas id="canvas" width="480" height="360" style="display:none;"></canvas>
<br>
<button onclick="captureImage()">๐Ÿ“ท Capture & Analyze</button>
<p id="result"></p>

<script>
  const video = document.getElementById('video');
  const canvas = document.getElementById('canvas');
  const context = canvas.getContext('2d');

  navigator.mediaDevices.getUserMedia({ video: true })
    .then(stream => { video.srcObject = stream; });

  function captureImage() {
    context.drawImage(video, 0, 0, canvas.width, canvas.height);
    canvas.toBlob(blob => {
      const formData = new FormData();
      formData.append('file', blob, 'frame.jpg');

      fetch('http://localhost:5000/predict', {
        method: 'POST',
        body: formData
      })
      .then(response => response.json())
      .then(data => {
        document.getElementById('result').innerText = "๐Ÿ” Prediction: " + data[0][1];
      });
    }, 'image/jpeg');
  }
</script>

๐Ÿง  Step 2: Flask Backend with AI Model

This Python backend uses ResNet50 to classify the uploaded frame.

from flask import Flask, request, jsonify
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np
import os

app = Flask(__name__)
model = ResNet50(weights='imagenet')

@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['file']
    file_path = 'temp.jpg'
    file.save(file_path)

    img = image.load_img(file_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    preds = model.predict(x)

    os.remove(file_path)
    return jsonify(decode_predictions(preds, top=3)[0])

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

๐Ÿงช Testing It Out

  1. Start your Python server: python app.py
  2. Open the HTML file in your browser (localhost or deploy)
  3. Allow camera access and click Capture & Analyze
  4. See the object prediction result instantly!

๐ŸŽฏ Use Cases

  • Retail – Identify product SKUs visually
  • Education – Real-time object identification
  • Healthcare – Medical object detection
  • Manufacturing – Real-time defect recognition

๐Ÿ’ก Bonus Ideas

  • Use YOLO or MobileNet for object detection (not just classification)
  • Stream predictions continuously while the video plays
  • Build a full-stack app with Spring Boot as the orchestrator

Now your browser is a real-time AI-powered lens! ๐Ÿ“ธ

Comments

Popular posts from this blog

Spring Boot with AI

Voice & Chatbots – AI-Assisted Conversational Apps

Java 17 Features