Kosher-ize It

A computer vision application that uses advanced OCR and image recognition to identify kosher ingredients from product labels and packaging.

Computer VisionOCR (Tesseract)Machine LearningDatabase SystemsReact NativeImage Processing
Launch App
Kosher-ize It

Health & Dietary Disclaimer

This content is for educational and informational purposes only. It is not intended as medical, health, or dietary advice. Dietary requirements, including kosher certification, may vary based on individual circumstances, religious interpretations, and regional standards. Always consult with qualified religious authorities, healthcare professionals, or certified kosher organizations for guidance on dietary compliance. The author and D613 Labs are not responsible for any health or dietary consequences resulting from the use of information or tools discussed in this content.

Kosher-ize It: Technical Case Study

Introduction

Kosher-ize It leverages cutting-edge computer vision technology to help users identify kosher ingredients from product labels. This case study explores the technical implementation, from OCR processing to database matching algorithms.

Tech Stack Deep Dive

Computer Vision Pipeline

We built a multi-stage computer vision pipeline that processes product images through several stages:

  1. Image Preprocessing: Normalize lighting, enhance contrast, and correct orientation
  2. Text Detection: Identify text regions using EAST (Efficient and Accurate Scene Text) detector
  3. OCR Processing: Extract text using Tesseract OCR with custom training
  4. Ingredient Parsing: Parse and normalize ingredient lists
  5. Database Matching: Match against kosher certification database
import cv2
import pytesseract
from PIL import Image

def process_product_image(image_path):
    # Load and preprocess image
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # Enhance contrast
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    enhanced = clahe.apply(gray)
    
    # Perform OCR
    text = pytesseract.image_to_string(enhanced, lang='eng+heb')
    
    return extract_ingredients(text)

Database Architecture

We designed a PostgreSQL database with optimized indexes for fast ingredient matching:

CREATE TABLE kosher_ingredients (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    normalized_name VARCHAR(255) NOT NULL,
    certification_status VARCHAR(50),
    certification_body VARCHAR(100),
    INDEX idx_normalized_name (normalized_name)
);

Fuzzy Matching Algorithm

Since product labels often have variations in ingredient names, we implemented a fuzzy matching system using Levenshtein distance:

from difflib import SequenceMatcher

def find_kosher_match(ingredient_name, threshold=0.85):
    normalized = normalize_ingredient(ingredient_name)
    
    matches = db.query("""
        SELECT name, certification_status,
               similarity(normalized_name, %s) as sim
        FROM kosher_ingredients
        WHERE similarity(normalized_name, %s) > %s
        ORDER BY sim DESC
        LIMIT 1
    """, (normalized, normalized, threshold))
    
    return matches[0] if matches else None

Challenges & Solutions

Challenge 1: OCR Accuracy with Varied Label Formats

Problem: Product labels come in many formats, fonts, and languages (English, Hebrew, etc.), making OCR challenging.

Solution: We implemented a multi-model approach:

  • Custom-trained Tesseract models for common label formats
  • Language detection to switch between English and Hebrew OCR
  • Post-processing to correct common OCR errors

Challenge 2: Real-time Processing on Mobile Devices

Problem: Processing images on mobile devices requires balancing accuracy with speed.

Solution: We use a hybrid approach:

  • Client-side preprocessing and basic OCR
  • Server-side deep processing for complex cases
  • Caching of common ingredient matches

Use Cases & Impact

Kosher-ize It has helped thousands of users:

  • Quickly identify kosher ingredients while shopping
  • Understand certification status of products
  • Make informed dietary choices
  • Access ingredient information in multiple languages

The app processes over 50,000 ingredient scans monthly with 92% accuracy.

Code Examples

python

Image preprocessing and OCR extraction for ingredient identification

def process_product_image(image_path):
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    text = pytesseract.image_to_string(gray, lang='eng+heb')
    return extract_ingredients(text)