Kosher-ize It: Technical Case Study

Introduction

Kosher-ize It leverages cutting-edge computer vision technology to help users identify kosher ingredients from product labels. This case study explores the technical implementation, from OCR processing to database matching algorithms.

Tech Stack Deep Dive

Computer Vision Pipeline

We built a multi-stage computer vision pipeline that processes product images through several stages:

Image Preprocessing: Normalize lighting, enhance contrast, and correct orientation
Text Detection: Identify text regions using EAST (Efficient and Accurate Scene Text) detector
OCR Processing: Extract text using Tesseract OCR with custom training
Ingredient Parsing: Parse and normalize ingredient lists
Database Matching: Match against kosher certification database

import cv2
import pytesseract
from PIL import Image

def process_product_image(image_path):
    # Load and preprocess image
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # Enhance contrast
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    enhanced = clahe.apply(gray)
    
    # Perform OCR
    text = pytesseract.image_to_string(enhanced, lang='eng+heb')
    
    return extract_ingredients(text)

Database Architecture

We designed a PostgreSQL database with optimized indexes for fast ingredient matching:

CREATE TABLE kosher_ingredients (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    normalized_name VARCHAR(255) NOT NULL,
    certification_status VARCHAR(50),
    certification_body VARCHAR(100),
    INDEX idx_normalized_name (normalized_name)
);

Fuzzy Matching Algorithm

Since product labels often have variations in ingredient names, we implemented a fuzzy matching system using Levenshtein distance:

from difflib import SequenceMatcher

def find_kosher_match(ingredient_name, threshold=0.85):
    normalized = normalize_ingredient(ingredient_name)
    
    matches = db.query("""
        SELECT name, certification_status,
               similarity(normalized_name, %s) as sim
        FROM kosher_ingredients
        WHERE similarity(normalized_name, %s) > %s
        ORDER BY sim DESC
        LIMIT 1
    """, (normalized, normalized, threshold))
    
    return matches[0] if matches else None

Challenges & Solutions

Challenge 1: OCR Accuracy with Varied Label Formats

Problem: Product labels come in many formats, fonts, and languages (English, Hebrew, etc.), making OCR challenging.

Solution: We implemented a multi-model approach:

Custom-trained Tesseract models for common label formats
Language detection to switch between English and Hebrew OCR
Post-processing to correct common OCR errors

Challenge 2: Real-time Processing on Mobile Devices

Problem: Processing images on mobile devices requires balancing accuracy with speed.

Solution: We use a hybrid approach:

Client-side preprocessing and basic OCR
Server-side deep processing for complex cases
Caching of common ingredient matches

Use Cases & Impact

Kosher-ize It has helped thousands of users:

Quickly identify kosher ingredients while shopping
Understand certification status of products
Make informed dietary choices
Access ingredient information in multiple languages

The app processes over 50,000 ingredient scans monthly with 92% accuracy.

Kosher-ize It

Health & Dietary Disclaimer