README.md aktualisiert

2026-01-27 17:52:43 +00:00
parent f519b2e769
commit 506407d6ef
1 changed files with 36 additions and 903 deletions
--- a/README.md
+++ b/README.md
@@ -1,35 +1,6 @@
 # Pi Zero 2W + ReSpeaker - OPTIMIERT FÜR 3 KOMMANDOS
 ## Lightweight Keyword Spotting statt vollständiges Sprachmodell

-**Status:** Ultra-leichte Lösung für nur 3-5 einfache Sprachbefehle  
-**Speicherverbrauch:** ~30MB (statt 150MB)  
-**RAM-Nutzung:** 20-40MB (statt 100-120MB)  
-**Performance:** 93-98% Erkennungsgenauigkeit  
-**Startup-Zeit:** < 1 Sekunde (statt 3-5 Sekunden)
-
---
-
-## VERGLEICH: Vollständig vs. Keyword Spotting
-
-### Option 1: Vosk (Deine ursprüngliche Lösung)
- ✅ Erkennt beliebige Sätze und Text
- ❌ 50-100MB Modell
- ❌ 80-120MB RAM erforderlich
- ❌ 40-60% CPU-Last auf Pi Zero 2W
- ❌ 3-5 Sekunden Startzeit
- ❌ Für 3 Kommandos völlig übertrieben
-
-### Option 2: Keyword Spotting (EMPFOHLEN für dich) ⭐
- ✅ Erkennt genau deine 3 Kommandos mit 93-98% Genauigkeit
- ✅ < 5MB Modell
- ✅ 20-40MB RAM erforderlich
- ✅ 5-15% CPU-Last (entspannt für Pi Zero 2W!)
- ✅ < 1 Sekunde Startup
- ✅ 4x schneller als Vosk
- ✅ Speichert 120MB Speicherplatz
-
-**FAZIT:** Für dich ist Option 2 definitiv die bessere Wahl!
-
 ---

 ## TEIL 1-3: Basis-Installation (wie vorher)
@@ -233,898 +204,60 @@ aplay -D hw:1,0 test_recording.wav

 ---

-## TEIL 4 OPTIMIERT: Ultra-Leichte Setup
+# installieren mit
+pip3 install vosk --break-system-packages

-### 4.1 Minimale Python-Pakete installieren
+mkdir ~/vosk-models
+cd ~/vosk-models
+wget https://alphacephei.com/vosk/models/vosk-model-small-de-0.15.zip
+unzip vosk-model-small-de-0.15.zip
+mv vosk-model-small-de-0.15 model

-```bash
-# Nur das Nötigste
-sudo apt install -y portaudio19-dev
-sudo apt install python3-pyaudio
-sudo apt install python3-numpy
-sudo apt install python3-scipy
-sudo python3 -m pip install sounddevice --break-system-packages
-# PocketSphinx (minimal, nur ~5MB)
-sudo apt install python3-pocketsphinx 
-sudo apt install python3-SpeechRecognition
-```

-**Das ist ALLES!** Keine großen Modelle.
+# aufnehmen mit
+arecord -D plughw:1,0 --format S16_LE --rate 16000 --channels 1 --duration 5 test_mono.wav
+# ausfuehren mit
+python3 test_simple.py test_mono.wav

-Dauer: 2-3 Minuten (statt 20-30 Minuten bei Vosk)

-### 4.2 Verzeichnisse erstellen

-```bash
-mkdir -p ~/voice_assistant
-mkdir -p ~/voice_assistant/sounds
-mkdir -p ~/voice_assistant/logs
-cd ~/voice_assistant
-```

---
-
-## TEIL 5 OPTIMIERT: Schlankes Python-Skript für 3 Kommandos
-
-### 5.1 Keyword Spotting Skript erstellen
-
-Erstelle `~/voice_assistant/keyword_spotting.py`:
-
-```bash
-nano ~/voice_assistant/keyword_spotting.py
-```
-
-Kopiere diesen **viel kürzeren und schnelleren Code**:
-
-```python
+#### test_simple.py
 #!/usr/bin/env python3
-# -*- coding: utf-8 -*-

-"""
-Keyword Spotting für Raspberry Pi Zero 2W mit ReSpeaker Hat v1.2
-Optimiert für exakt 3 Kommandos - Ultra-leicht und schnell
-Speicher: ~30MB, RAM: 20-40MB, CPU: 5-15%, Startup: < 1 Sekunde
-"""
-
-import json
-import os
-import sys
-import logging
-import subprocess
-import time
-import numpy as np
-import sounddevice as sd
-from pathlib import Path
-from collections import deque
-
-# ============================================================================
-# KONFIGURATION - Nur deine 3 Kommandos!
-# ============================================================================
-
-class Config:
-    # Pfade
-    BASE_DIR = Path(__file__).parent
-    SOUNDS_DIR = BASE_DIR / "sounds"
-    LOGS_DIR = BASE_DIR / "logs"
-    
-    # Audio-Einstellungen (minimal)
-    SAMPLERATE = 16000
-    CHUNK_SIZE = 512
-    CHANNELS = 1
-    DEVICE_INDEX = None
-    
-    # === DEINE 3 KOMMANDOS ===
-    # Format: "Gesprochenes Wort" -> "Sounddatei" und "Aktion"
-    KEYWORDS = {
-        "musik": {
-            "sound": "music.wav",
-            "action": "play_music",
-            "confidence": 0.65,  # 65% Sicherheit ausreichend
-        },
-        "stopp": {
-            "sound": "stopped.wav",
-            "action": "stop",
-            "confidence": 0.70,
-        },
-        "licht": {
-            "sound": "light.wav",
-            "action": "toggle_light",
-            "confidence": 0.68,
-        },
-    }
-    
-    # Logging
-    LOG_FILE = LOGS_DIR / "keyword_spotting.log"
-    LOG_LEVEL = logging.INFO
-
-# ============================================================================
-# LOGGING SETUP
-# ============================================================================
-
-def setup_logging():
-    """Einfaches Logging"""
-    Config.LOGS_DIR.mkdir(exist_ok=True)
-    
-    logger = logging.getLogger("KeywordSpotter")
-    logger.setLevel(Config.LOG_LEVEL)
-    
-    # File handler
-    fh = logging.FileHandler(Config.LOG_FILE)
-    fh.setLevel(Config.LOG_LEVEL)
-    
-    # Console handler
-    ch = logging.StreamHandler()
-    ch.setLevel(Config.LOG_LEVEL)
-    
-    # Formatter
-    formatter = logging.Formatter(
-        '%(asctime)s - %(levelname)s - %(message)s',
-        datefmt='%Y-%m-%d %H:%M:%S'
-    )
-    fh.setFormatter(formatter)
-    ch.setFormatter(formatter)
-    
-    logger.addHandler(fh)
-    logger.addHandler(ch)
-    
-    return logger
-
-logger = setup_logging()
-
-# ============================================================================
-# AUDIO-GERÄTE
-# ============================================================================
-
-def find_respeaker_device():
-    """Finde ReSpeaker-Gerät"""
-    logger.info("Suche ReSpeaker...")
-    try:
-        for index, name in enumerate(sd.query_devices()):
-            if isinstance(name, dict):
-                device_name = name.get('name', '')
-            else:
-                device_name = str(name)
-            
-            if 'seeed' in device_name.lower():
-                logger.info(f"✓ ReSpeaker gefunden: Index {index}")
-                return index
-    except:
-        pass
-    
-    logger.warning("⚠ ReSpeaker nicht gefunden, nutze Standard-Audio")
-    return None
-
-# ============================================================================
-# AKUSTISCHE FINGERPRINTS (Ultra-leicht statt ML-Modell)
-# ============================================================================
-
-class AudioFingerprint:
-    """
-    Erzeugt akustische Fingerprints für Keywords
-    Viel leichter als ML-Modelle - nur ~5MB gesamt!
-    """
-    
-    @staticmethod
-    def extract_features(audio_chunk):
-        """
-        Extrahiere einfache Audio-Features für Vergleich
-        - Zero Crossing Rate (ZCR)
-        - Energy
-        - Spektrale Centroid
-        - MFCC (vereinfacht)
-        """
-        audio = np.array(audio_chunk, dtype=np.float32) / 32768.0
-        
-        # 1. Zero Crossing Rate (schnelle/langsame Sprache)
-        zcr = np.mean(np.abs(np.diff(np.sign(audio))))
-        
-        # 2. Energy (Lautstärke)
-        energy = np.sqrt(np.mean(audio ** 2))
-        
-        # 3. Spectral features (sehr vereinfacht)
-        fft = np.abs(np.fft.fft(audio[:512]))
-        freq_energy = [
-            np.sum(fft[0:50]),      # Tiefe Frequenzen
-            np.sum(fft[50:150]),    # Mittlere Frequenzen
-            np.sum(fft[150:256]),   # Hohe Frequenzen
-        ]
-        
-        return np.array([zcr, energy] + freq_energy, dtype=np.float32)
-    
-    @staticmethod
-    def compare_fingerprints(fp1, fp2):
-        """Vergleiche zwei Fingerprints (0.0 = unterschiedlich, 1.0 = identisch)"""
-        # Normalisiere
-        fp1_norm = (fp1 - np.mean(fp1)) / (np.std(fp1) + 1e-6)
-        fp2_norm = (fp2 - np.mean(fp2)) / (np.std(fp2) + 1e-6)
-        
-        # Cosine similarity
-        similarity = np.dot(fp1_norm, fp2_norm) / (
-            np.linalg.norm(fp1_norm) * np.linalg.norm(fp2_norm) + 1e-6
-        )
-        
-        # Normalisiere auf [0, 1]
-        similarity = (similarity + 1.0) / 2.0
-        return max(0.0, min(1.0, similarity))
-
-# ============================================================================
-# REFERENCE FINGERPRINTS (Training)
-# ============================================================================
-
-class ReferenceDatabase:
-    """
-    Speichert Reference-Fingerprints für deine 3 Kommandos
-    WICHTIG: Diese müssen einmalig trainiert werden!
-    """
-    
-    def __init__(self):
-        self.db_file = Config.BASE_DIR / "reference_fingerprints.npy"
-        self.keywords_file = Config.BASE_DIR / "reference_keywords.txt"
-        self.fingerprints = {}
-        self.load_or_create()
-    
-    def load_or_create(self):
-        """Lade existierende oder erstelle neue Referenzen"""
-        if self.db_file.exists() and self.keywords_file.exists():
-            logger.info("Lade existierende Reference-Fingerprints...")
-            try:
-                data = np.load(self.db_file, allow_pickle=True).item()
-                self.fingerprints = data
-                logger.info(f"✓ {len(self.fingerprints)} Keywords geladen")
-            except Exception as e:
-                logger.warning(f"Konnte Fingerprints nicht laden: {e}")
-                self.create_default_fingerprints()
-        else:
-            logger.info("Erstelle Default-Fingerprints...")
-            self.create_default_fingerprints()
-    
-    def create_default_fingerprints(self):
-        """
-        Erstelle vereinfachte Default-Fingerprints
-        In Produktion würdest du diese durch echte Audio-Samples trainieren!
-        """
-        logger.warning("⚠ WICHTIG: Benutze bin/prepare_training.py für Training!")
-        
-        # Vereinfachte Fingerprints als Platzhalter
-        # Später durch echte Samples ersetzen!
-        self.fingerprints = {
-            "musik": np.array([0.05, 0.3, 100, 500, 200], dtype=np.float32),
-            "stopp": np.array([0.02, 0.2, 150, 400, 300], dtype=np.float32),
-            "licht": np.array([0.04, 0.25, 120, 450, 250], dtype=np.float32),
-        }
-        
-        self.save()
-    
-    def save(self):
-        """Speichere Fingerprints"""
-        try:
-            np.save(self.db_file, self.fingerprints)
-            logger.info(f"✓ Reference-Fingerprints gespeichert")
-        except Exception as e:
-            logger.error(f"Fehler beim Speichern: {e}")
-    
-    def add_training_sample(self, keyword, audio_chunk):
-        """Füge Trainings-Sample hinzu"""
-        fp = AudioFingerprint.extract_features(audio_chunk)
-        
-        if keyword not in self.fingerprints:
-            self.fingerprints[keyword] = fp
-        else:
-            # Durchschnitt mit existierendem
-            self.fingerprints[keyword] = (
-                self.fingerprints[keyword] + fp
-            ) / 2.0
-        
-        self.save()
-        logger.info(f"✓ Training-Sample hinzugefügt: {keyword}")
-
-# ============================================================================
-# KEYWORD SPOTTER
-# ============================================================================
-
-class KeywordSpotter:
-    """Höre nach deinen 3 Keywords"""
-    
-    def __init__(self):
-        logger.info("Initialisiere Keyword Spotter...")
-        
-        Config.DEVICE_INDEX = find_respeaker_device()
-        self.ref_db = ReferenceDatabase()
-        
-        self.stream = None
-        self.is_running = False
-        self.buffer = deque(maxlen=Config.SAMPLERATE)  # 1 Sekunde Buffer
-    
-    def audio_callback(self, indata, frames, time_info, status):
-        """Callback beim Audio-Input"""
-        if status:
-            logger.warning(f"Audio-Status: {status}")
-        
-        # Füge zu Buffer hinzu
-        audio_data = indata[:, 0]
-        for sample in audio_data:
-            self.buffer.append(int(sample * 32767))
-    
-    def start(self):
-        """Starte Audio-Listening"""
-        logger.info("Starte Audio-Listening...")
-        try:
-            self.stream = sd.InputStream(
-                samplerate=Config.SAMPLERATE,
-                blocksize=Config.CHUNK_SIZE,
-                channels=Config.CHANNELS,
-                device=Config.DEVICE_INDEX,
-                callback=self.audio_callback
-            )
-            self.stream.start()
-            self.is_running = True
-            logger.info("✓ Audio-Listening aktiv")
-        except Exception as e:
-            logger.error(f"Fehler beim Starten: {e}")
-            raise
-    
-    def stop(self):
-        """Stoppe Audio-Listening"""
-        logger.info("Stoppe Audio-Listening...")
-        if self.stream:
-            self.stream.stop()
-            self.stream.close()
-        self.is_running = False
-    
-    def detect_keywords(self):
-        """
-        Erkenne Keywords kontinuierlich
-        Rückgabe: (keyword, confidence) oder (None, 0)
-        """
-        if len(self.buffer) < Config.SAMPLERATE:
-            return None, 0
-        
-        audio_chunk = list(self.buffer)
-        current_fp = AudioFingerprint.extract_features(audio_chunk)
-        
-        best_keyword = None
-        best_confidence = 0
-        
-        # Vergleiche mit allen Keywords
-        for keyword, threshold_config in Config.KEYWORDS.items():
-            ref_fp = self.ref_db.fingerprints.get(keyword)
-            
-            if ref_fp is None:
-                continue
-            
-            # Berechne Ähnlichkeit
-            similarity = AudioFingerprint.compare_fingerprints(current_fp, ref_fp)
-            required_threshold = threshold_config.get("confidence", 0.7)
-            
-            logger.debug(f"{keyword}: {similarity:.2%} (benötigt: {required_threshold:.0%})")
-            
-            # Ist besser als bisherig?
-            if similarity > best_confidence and similarity >= required_threshold:
-                best_keyword = keyword
-                best_confidence = similarity
-        
-        return best_keyword, best_confidence
-
-# ============================================================================
-# SOUND-AUSGABE
-# ============================================================================
-
-class SoundPlayer:
-    """Spiele Sounds ab"""
-    
-    def __init__(self):
-        self.sounds_dir = Config.SOUNDS_DIR
-        self.sounds_dir.mkdir(exist_ok=True)
-    
-    def play_sound(self, filename):
-        """Spiele Sound ab"""
-        sound_path = self.sounds_dir / filename
-        
-        if not sound_path.exists():
-            logger.warning(f"⚠ Sound nicht gefunden: {filename}")
-            return False
-        
-        try:
-            logger.info(f"♪ Spiele Sound ab: {filename}")
-            subprocess.run(
-                ['aplay', '-D', 'hw:1,0', str(sound_path)],
-                check=True,
-                capture_output=True,
-                timeout=10
-            )
-            return True
-        except Exception as e:
-            logger.error(f"✗ Fehler beim Abspielen: {e}")
-            return False
-
-# ============================================================================
-# AKTION-HANDLER
-# ============================================================================
-
-class ActionHandler:
-    """Führe Aktionen aus"""
-    
-    def __init__(self, sound_player):
-        self.sound_player = sound_player
-    
-    def execute(self, keyword):
-        """Führe Aktion aus"""
-        if keyword not in Config.KEYWORDS:
-            return False
-        
-        config = Config.KEYWORDS[keyword]
-        logger.info(f"🎯 Erkannt: {keyword.upper()}")
-        
-        # Spiele Sound ab
-        if config.get("sound"):
-            self.sound_player.play_sound(config["sound"])
-        
-        # Führe Aktion aus
-        action = config.get("action")
-        
-        if action == "play_music":
-            logger.info("▶ Musik abspielen...")
-            # Hier könnten echte Aktionen folgen
-        elif action == "stop":
-            logger.info("⏹ Stoppen...")
-        elif action == "toggle_light":
-            logger.info("💡 Licht umschalten...")
-            # GPIO-Beispiel: GPIO.output(17, not GPIO.input(17))
-        
-        return True
-
-# ============================================================================
-# HAUPTPROGRAMM
-# ============================================================================
-
-class VoiceControllerLite:
-    """Hauptprogramm - Ultra-leicht und schnell"""
-    
-    def __init__(self):
-        logger.info("=" * 70)
-        logger.info("Voice Controller (Lite) für Pi Zero 2W")
-        logger.info("Keyword Spotting - Nur 3 Kommandos, super schnell!")
-        logger.info("=" * 70)
-        
-        try:
-            self.spotter = KeywordSpotter()
-            self.sound_player = SoundPlayer()
-            self.action_handler = ActionHandler(self.sound_player)
-            
-            self.last_detection = 0
-            self.detection_cooldown = 1.0  # 1 Sekunde zwischen Erkennungen
-        except Exception as e:
-            logger.error(f"✗ Initialisierungsfehler: {e}")
-            raise
-    
-    def run(self):
-        """Hauptschleife"""
-        logger.info("Starte Hauptschleife...")
-        
-        try:
-            self.spotter.start()
-            
-            detection_count = 0
-            
-            while True:
-                try:
-                    keyword, confidence = self.spotter.detect_keywords()
-                    
-                    if keyword and confidence > 0.5:
-                        # Cooldown prüfen (verhindert Mehrfacherkennung)
-                        current_time = time.time()
-                        if current_time - self.last_detection > self.detection_cooldown:
-                            detection_count += 1
-                            logger.info(
-                                f"[#{detection_count}] ✓ {keyword.upper()} "
-                                f"({confidence:.1%})"
-                            )
-                            
-                            # Führe Aktion aus
-                            self.action_handler.execute(keyword)
-                            self.last_detection = current_time
-                    
-                    # Kurze Pause (nicht 100% CPU)
-                    time.sleep(0.1)
-                
-                except KeyboardInterrupt:
-                    logger.info("\n⏹ Unterbrochen durch Benutzer")
-                    break
-                except Exception as e:
-                    logger.error(f"Fehler in Schleife: {e}")
-                    time.sleep(1)
-        
-        finally:
-            self.spotter.stop()
-            logger.info("✓ Voice Controller beendet")
-
-# ============================================================================
-# EINSTIEGSPUNKT
-# ============================================================================
-
-if __name__ == "__main__":
-    try:
-        controller = VoiceControllerLite()
-        controller.run()
-    except KeyboardInterrupt:
-        logger.info("Beendet")
-        sys.exit(0)
-    except Exception as e:
-        logger.error(f"✗ Kritischer Fehler: {e}", exc_info=True)
-        sys.exit(1)
-```
-
-Speichere die Datei (Ctrl+X, Y, Enter).
-
-### 5.2 Training-Skript erstellen
-
-Erstelle `~/voice_assistant/prepare_training.py` um die Keywords zu trainieren:
-
-```bash
-nano ~/voice_assistant/prepare_training.py
-```
-
-```python
-#!/usr/bin/env python3
-# -*- coding: utf-8 -*-
-
-"""
-Training-Skript: Nimm Audio-Samples deiner 3 Keywords auf
-Dies muss einmalig am Anfang durchgeführt werden!
-"""
-
-import os
-import sys
-import logging
-import sounddevice as sd
-import numpy as np
-from pathlib import Path
-from keyword_spotting import (
-    Config, setup_logging, find_respeaker_device,
-    ReferenceDatabase, AudioFingerprint
-)
-
-logger = setup_logging()
-
-def record_keyword_sample(keyword, duration=2.0):
-    """
-    Nimme Audio-Sample auf
-    Dauer: 2 Sekunden
-    """
-    print(f"\n{'='*60}")
-    print(f"Recording: '{keyword}'")
-    print(f"{'='*60}")
-    print(f"⏺ Aufnahme in 3 Sekunden... (Drücke SPACE zur Bereitschaft)")
-    input("Drücke ENTER, wenn bereit >")
-    
-    Config.DEVICE_INDEX = find_respeaker_device()
-    
-    print(f"🔴 Aufnahme läuft... ({duration}s)")
-    
-    # Aufnahme
-    audio = sd.rec(
-        int(Config.SAMPLERATE * duration),
-        samplerate=Config.SAMPLERATE,
-        channels=1,
-        device=Config.DEVICE_INDEX,
-        dtype='int16'
-    )
-    
-    sd.wait()
-    
-    print("✓ Aufnahme abgeschlossen")
-    
-    return audio[:, 0] if audio.ndim > 1 else audio
-
-def train_keyword(keyword, num_samples=3):
-    """
-    Trainiere Keyword mit mehreren Samples
-    Empfohlen: 3-5 Samples pro Keyword
-    """
-    logger.info(f"\n{'='*60}")
-    logger.info(f"Training: {keyword.upper()}")
-    logger.info(f"{'='*60}")
-    logger.info(f"Bitte nimm {num_samples} Samples des Keywords '{keyword}' auf")
-    
-    db = ReferenceDatabase()
-    fingerprints = []
-    
-    for i in range(num_samples):
-        print(f"\n[Sample {i+1}/{num_samples}] '{keyword}'")
-        audio = record_keyword_sample(keyword, duration=2.0)
-        
-        # Extrahiere Fingerprint
-        fp = AudioFingerprint.extract_features(audio)
-        fingerprints.append(fp)
-        
-        print(f"✓ Fingerprint extrahiert: {fp}")
-    
-    # Durchschnitt aller Samples
-    avg_fingerprint = np.mean(fingerprints, axis=0)
-    db.fingerprints[keyword] = avg_fingerprint
-    db.save()
-    
-    logger.info(f"✓ {keyword} trainiert und gespeichert!")
-    return True
-
-def main():
-    """Haupttraining"""
-    print("\n" + "="*60)
-    print("KEYWORD SPOTTING - TRAINING")
-    print("="*60)
-    print("\nAufnehmen von Sprachsamples für deine 3 Keywords:")
-    print("1. musik")
-    print("2. stopp")
-    print("3. licht")
-    print("\nFür jeden Keyword werden 3 Samples benötigt.")
-    print("Sprich das Keyword klar und deutlich ins Mikrofon.")
-    print("\n" + "="*60 + "\n")
-    
-    input("Drücke ENTER um zu starten >")
-    
-    try:
-        for keyword in Config.KEYWORDS.keys():
-            train_keyword(keyword, num_samples=3)
-        
-        print("\n" + "="*60)
-        print("✓ TRAINING ABGESCHLOSSEN!")
-        print("="*60)
-        print("\nRun jetzt: python3 keyword_spotting.py")
-        
-    except KeyboardInterrupt:
-        logger.info("\n✗ Training abgebrochen")
-        sys.exit(0)
-    except Exception as e:
-        logger.error(f"✗ Fehler: {e}", exc_info=True)
-        sys.exit(1)
-
-if __name__ == "__main__":
-    main()
-```
-
-Speichere die Datei.
-
-### 5.3 Sound-Dateien erstellen
-
-```bash
-python3 << 'EOF'
 import wave
-import math
+import sys

-def generate_tone(frequency, duration, sample_rate=16000):
-    samples = []
-    for i in range(int(sample_rate * duration)):
-        sample = int(32767 * 0.3 * math.sin(2 * math.pi * frequency * i / sample_rate))
-        samples.append(sample)
-    return samples
+from vosk import Model, KaldiRecognizer, SetLogLevel

-# Music
-sounds = generate_tone(523, 0.15) + generate_tone(587, 0.15) + generate_tone(659, 0.15)
-with wave.open('/home/pi/voice_assistant/sounds/music.wav', 'wb') as f:
-    f.setnchannels(1)
-    f.setsampwidth(2)
-    f.setframerate(16000)
-    f.writeframes(b''.join(s.to_bytes(2, 'little', signed=True) for s in sounds))
-print("✓ music.wav")
+# You can set log level to -1 to disable debug messages
+SetLogLevel(0)

-# Stopped
-sounds = generate_tone(440, 0.3)
-with wave.open('/home/pi/voice_assistant/sounds/stopped.wav', 'wb') as f:
-    f.setnchannels(1)
-    f.setsampwidth(2)
-    f.setframerate(16000)
-    f.writeframes(b''.join(s.to_bytes(2, 'little', signed=True) for s in sounds))
-print("✓ stopped.wav")
+wf = wave.open(sys.argv[1], "rb")
+if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
+    print("Audio file must be WAV format mono PCM.")
+    sys.exit(1)

-# Light
-sounds = generate_tone(587, 0.2) + generate_tone(659, 0.1)
-with wave.open('/home/pi/voice_assistant/sounds/light.wav', 'wb') as f:
-    f.setnchannels(1)
-    f.setsampwidth(2)
-    f.setframerate(16000)
-    f.writeframes(b''.join(s.to_bytes(2, 'little', signed=True) for s in sounds))
-print("✓ light.wav")
-EOF
-```
+model = Model("model") #lang="en-us")

-### 5.4 TRAINING durchführen (WICHTIG!)
+# You can also init model by name or with a folder path
+# model = Model(model_name="vosk-model-en-us-0.21")
+# model = Model("models/en")

-```bash
-cd ~/voice_assistant
-chmod +x prepare_training.py
-python3 prepare_training.py
-```
+rec = KaldiRecognizer(model, wf.getframerate())
+rec.SetWords(True)
+rec.SetPartialWords(True)

-**Das Trainings-Skript wird dich auffordern:**
-1. Sprich 3x das Wort "musik"
-2. Sprich 3x das Wort "stopp"
-3. Sprich 3x das Wort "licht"
+while True:
+    data = wf.readframes(4000)
+    if len(data) == 0:
+        break
+    if rec.AcceptWaveform(data):
+        print(rec.Result())
+    else:
+        print(rec.PartialResult())

-Jedes Sample dauert 2 Sekunden. Die Fingerprints werden automatisch gespeichert.
+print(rec.FinalResult())

-**Dauer:** ~5 Minuten

-### 5.5 Test

-Nach dem Training:
-
-```bash
-python3 ~/voice_assistant/keyword_spotting.py
-```
-
-Jetzt:
-1. Sprich: "musik" → Sound abspielen
-2. Sprich: "stopp" → Sound abspielen
-3. Sprich: "licht" → Sound abspielen
-
-Beende mit Ctrl+C.
-
---
-
-## TEIL 6: Systemctl Service (wie vorher)
-
-```bash
-sudo nano /etc/systemd/system/voice-assistant.service
-```
-
-```ini
-[Unit]
-Description=Voice Assistant - Keyword Spotting
-After=network.target sound.target
-
-[Service]
-Type=simple
-User=pi
-WorkingDirectory=/home/pi/voice_assistant
-ExecStart=/usr/bin/python3 /home/pi/voice_assistant/keyword_spotting.py
-Restart=on-failure
-RestartSec=5
-StandardOutput=journal
-StandardError=journal
-
-# Ressourcen-Limits
-MemoryMax=128M
-CPUQuota=30%
-
-[Install]
-WantedBy=multi-user.target
-```
-
-```bash
-sudo systemctl daemon-reload
-sudo systemctl enable voice-assistant.service
-sudo systemctl start voice-assistant.service
-sudo systemctl status voice-assistant.service
-```
-
---
-
-## PERFORMANCE-VERGLEICH
-
-### Speichernutzung:
-
-```bash
-# Vor (Vosk)
-du -sh ~/voice_models/
-# Ausgabe: ~100MB
-
-# Nach (Keyword Spotting)
-du -sh ~/voice_assistant/
-# Ausgabe: ~2MB (!!)
-```
-
-### RAM während Betrieb:
-
-```bash
-ps aux | grep python3 | grep keyword
-# Vosk: ~100-120MB
-# Keyword Spotting: ~25-35MB
-```
-
-### CPU-Last:
-
-```bash
-# top
-# Vosk: 40-60% (Pi Zero 2W läuft fast warm!)
-# Keyword Spotting: 5-15% (gemütlich!)
-```
-
-### Startup-Zeit:
-
-```bash
-time python3 keyword_spotting.py
-# Vosk: real 0m3.5s
-# Keyword Spotting: real 0m0.4s (!!)
-```
-
---
-
-## RESSOURCEN-VERGLEICH (Zusammenfassung)
-
-| Metrik | Vosk | Keyword Spotting | Einsparung |
-|--------|------|------------------|------------|
-| **Modellgröße** | 50-100MB | < 1MB | 99%! |
-| **RAM-Nutzung** | 100-120MB | 25-35MB | 75% |
-| **CPU-Last (Pi Zero 2W)** | 40-60% | 5-15% | 75% |
-| **Startup-Zeit** | 3-5s | 0.4s | 90% |
-| **Erkennungslatenz** | 200-500ms | 50-100ms | 75% |
-| **Genauigkeit (3 Befehle)** | 85-92% | 93-98% | +10% |
-| **Speicherplatz (gesamt)** | ~150MB | ~30MB | 80% |
-
-**Fazit:** Du sparst massiv Ressourcen bei besserer Performance!
-
---
-
-## TROUBLESHOOTING
-
-**Problem: "Erkennung funktioniert nicht nach Training"**
-
-```bash
-# Überprüfe ob Fingerprints gespeichert wurden
-ls -la ~/voice_assistant/*.npy
-
-# Zeige gespeicherte Keywords
-cat ~/voice_assistant/reference_keywords.txt
-```
-
-**Problem: "False Positives (erkennt Worte, die nicht gesprochen wurden)"**
-
-Erhöhe die Confidence-Schwelle in `keyword_spotting.py`:
-
-```python
-"musik": {
-    "confidence": 0.75,  # Vorher: 0.65
-}
-```
-
-**Problem: "Erkennung zu ungenau"**
-
-Trainiere erneut mit besserer Aussprache:
-
-```bash
-python3 prepare_training.py
-```
-
-Sprich die Keywords deutlicher und lauter.
-
---
-
-## NÄCHSTE SCHRITTE
-
-Mit dieser Lösung kannst du:
-
-1. ✅ **3 Keywords erkennen** mit 93-98% Genauigkeit
-2. ✅ **Super schnell starten** (< 1 Sekunde)
-3. ✅ **Speicher sparen** (80% weniger!)
-4. ✅ **CPU sparen** (75% weniger Last)
-5. ✅ **Offline arbeiten** (kein Internet nötig)
-
-Wenn du später **mehr Kommandos** brauchst:
- 5 Kommandos: Noch OK mit dieser Methode
- 10+ Kommandos: Wechsel zu leichtem ML-Modell (TensorFlow Lite)
- Beliebige Sprache: Dann Vosk nötig
-
---
-
-## FRAGEN & ANTWORTEN
-
-**F: Kann ich mehr als 3 Kommandos hinzufügen?**
-A: Ja, bis ca. 10 Kommandos bleibt die Methode effizient. Mehr als 10 → TensorFlow Lite ML-Modell nutzen.
-
-**F: Wie lange dauert Training?**
-A: ~5 Minuten (3 Samples × 3 Keywords × 2 Sekunden + Verarbeitung)
-
-**F: Muss ich jedes Mal neu trainieren?**
-A: Nein, die Fingerprints werden gespeichert. Nur am Anfang nötig.
-
-**F: Funktioniert es auch mit Dialekt/Akzent?**
-A: Ja! Trainiere mit DEINEM Akzent, dann erkannt der System dich perfekt.
-
-**F: Was ist wenn jemand anders spricht?**
-A: Die Erkennung wird dann weniger genau (ca. 10-20% weniger). Das ist normal - trainiere ggf. mit mehreren Stimmen.
-
---
-
-**Viel Erfolg mit deinem schlanken Voice Control System! 🎉**
-
-Die Lösung ist optimiert, super schnell und perfekt für Pi Zero 2W!