Real-Time Camera Input for Image Recognition
Imagine pointing your webcam at an object and instantly getting a prediction of what it is—just like Google Lens! In this guide, we’ll connect your browser's camera to a backend AI model built using TensorFlow or PyTorch, all in real-time.
๐ง Tech Stack
- HTML5 + JavaScript – to access webcam and capture frames
- Flask (Python) – to serve the model and process images
- TensorFlow or PyTorch – for the image classification model
๐ฌ Step 1: HTML + JS for Webcam Input
Use the getUserMedia() API to stream webcam video, and capture frames as images.
<video id="video" width="480" height="360" autoplay></video>
<canvas id="canvas" width="480" height="360" style="display:none;"></canvas>
<br>
<button onclick="captureImage()">๐ท Capture & Analyze</button>
<p id="result"></p>
<script>
const video = document.getElementById('video');
const canvas = document.getElementById('canvas');
const context = canvas.getContext('2d');
navigator.mediaDevices.getUserMedia({ video: true })
.then(stream => { video.srcObject = stream; });
function captureImage() {
context.drawImage(video, 0, 0, canvas.width, canvas.height);
canvas.toBlob(blob => {
const formData = new FormData();
formData.append('file', blob, 'frame.jpg');
fetch('http://localhost:5000/predict', {
method: 'POST',
body: formData
})
.then(response => response.json())
.then(data => {
document.getElementById('result').innerText = "๐ Prediction: " + data[0][1];
});
}, 'image/jpeg');
}
</script>
๐ง Step 2: Flask Backend with AI Model
This Python backend uses ResNet50 to classify the uploaded frame.
from flask import Flask, request, jsonify
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np
import os
app = Flask(__name__)
model = ResNet50(weights='imagenet')
@app.route('/predict', methods=['POST'])
def predict():
file = request.files['file']
file_path = 'temp.jpg'
file.save(file_path)
img = image.load_img(file_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
os.remove(file_path)
return jsonify(decode_predictions(preds, top=3)[0])
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
๐งช Testing It Out
- Start your Python server:
python app.py - Open the HTML file in your browser (localhost or deploy)
- Allow camera access and click Capture & Analyze
- See the object prediction result instantly!
๐ฏ Use Cases
- Retail – Identify product SKUs visually
- Education – Real-time object identification
- Healthcare – Medical object detection
- Manufacturing – Real-time defect recognition
๐ก Bonus Ideas
- Use YOLO or MobileNet for object detection (not just classification)
- Stream predictions continuously while the video plays
- Build a full-stack app with Spring Boot as the orchestrator
Now your browser is a real-time AI-powered lens! ๐ธ
Comments
Post a Comment