README.md aktualisiert
This commit is contained in:
939
README.md
939
README.md
@@ -1,35 +1,6 @@
|
||||
# Pi Zero 2W + ReSpeaker - OPTIMIERT FÜR 3 KOMMANDOS
|
||||
## Lightweight Keyword Spotting statt vollständiges Sprachmodell
|
||||
|
||||
**Status:** Ultra-leichte Lösung für nur 3-5 einfache Sprachbefehle
|
||||
**Speicherverbrauch:** ~30MB (statt 150MB)
|
||||
**RAM-Nutzung:** 20-40MB (statt 100-120MB)
|
||||
**Performance:** 93-98% Erkennungsgenauigkeit
|
||||
**Startup-Zeit:** < 1 Sekunde (statt 3-5 Sekunden)
|
||||
|
||||
---
|
||||
|
||||
## VERGLEICH: Vollständig vs. Keyword Spotting
|
||||
|
||||
### Option 1: Vosk (Deine ursprüngliche Lösung)
|
||||
- ✅ Erkennt beliebige Sätze und Text
|
||||
- ❌ 50-100MB Modell
|
||||
- ❌ 80-120MB RAM erforderlich
|
||||
- ❌ 40-60% CPU-Last auf Pi Zero 2W
|
||||
- ❌ 3-5 Sekunden Startzeit
|
||||
- ❌ Für 3 Kommandos völlig übertrieben
|
||||
|
||||
### Option 2: Keyword Spotting (EMPFOHLEN für dich) ⭐
|
||||
- ✅ Erkennt genau deine 3 Kommandos mit 93-98% Genauigkeit
|
||||
- ✅ < 5MB Modell
|
||||
- ✅ 20-40MB RAM erforderlich
|
||||
- ✅ 5-15% CPU-Last (entspannt für Pi Zero 2W!)
|
||||
- ✅ < 1 Sekunde Startup
|
||||
- ✅ 4x schneller als Vosk
|
||||
- ✅ Speichert 120MB Speicherplatz
|
||||
|
||||
**FAZIT:** Für dich ist Option 2 definitiv die bessere Wahl!
|
||||
|
||||
---
|
||||
|
||||
## TEIL 1-3: Basis-Installation (wie vorher)
|
||||
@@ -233,898 +204,60 @@ aplay -D hw:1,0 test_recording.wav
|
||||
|
||||
---
|
||||
|
||||
## TEIL 4 OPTIMIERT: Ultra-Leichte Setup
|
||||
# installieren mit
|
||||
pip3 install vosk --break-system-packages
|
||||
|
||||
### 4.1 Minimale Python-Pakete installieren
|
||||
mkdir ~/vosk-models
|
||||
cd ~/vosk-models
|
||||
wget https://alphacephei.com/vosk/models/vosk-model-small-de-0.15.zip
|
||||
unzip vosk-model-small-de-0.15.zip
|
||||
mv vosk-model-small-de-0.15 model
|
||||
|
||||
```bash
|
||||
# Nur das Nötigste
|
||||
sudo apt install -y portaudio19-dev
|
||||
sudo apt install python3-pyaudio
|
||||
sudo apt install python3-numpy
|
||||
sudo apt install python3-scipy
|
||||
sudo python3 -m pip install sounddevice --break-system-packages
|
||||
# PocketSphinx (minimal, nur ~5MB)
|
||||
sudo apt install python3-pocketsphinx
|
||||
sudo apt install python3-SpeechRecognition
|
||||
```
|
||||
|
||||
**Das ist ALLES!** Keine großen Modelle.
|
||||
# aufnehmen mit
|
||||
arecord -D plughw:1,0 --format S16_LE --rate 16000 --channels 1 --duration 5 test_mono.wav
|
||||
# ausfuehren mit
|
||||
python3 test_simple.py test_mono.wav
|
||||
|
||||
Dauer: 2-3 Minuten (statt 20-30 Minuten bei Vosk)
|
||||
|
||||
### 4.2 Verzeichnisse erstellen
|
||||
|
||||
```bash
|
||||
mkdir -p ~/voice_assistant
|
||||
mkdir -p ~/voice_assistant/sounds
|
||||
mkdir -p ~/voice_assistant/logs
|
||||
cd ~/voice_assistant
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## TEIL 5 OPTIMIERT: Schlankes Python-Skript für 3 Kommandos
|
||||
|
||||
### 5.1 Keyword Spotting Skript erstellen
|
||||
|
||||
Erstelle `~/voice_assistant/keyword_spotting.py`:
|
||||
|
||||
```bash
|
||||
nano ~/voice_assistant/keyword_spotting.py
|
||||
```
|
||||
|
||||
Kopiere diesen **viel kürzeren und schnelleren Code**:
|
||||
|
||||
```python
|
||||
#### test_simple.py
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
Keyword Spotting für Raspberry Pi Zero 2W mit ReSpeaker Hat v1.2
|
||||
Optimiert für exakt 3 Kommandos - Ultra-leicht und schnell
|
||||
Speicher: ~30MB, RAM: 20-40MB, CPU: 5-15%, Startup: < 1 Sekunde
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import logging
|
||||
import subprocess
|
||||
import time
|
||||
import numpy as np
|
||||
import sounddevice as sd
|
||||
from pathlib import Path
|
||||
from collections import deque
|
||||
|
||||
# ============================================================================
|
||||
# KONFIGURATION - Nur deine 3 Kommandos!
|
||||
# ============================================================================
|
||||
|
||||
class Config:
|
||||
# Pfade
|
||||
BASE_DIR = Path(__file__).parent
|
||||
SOUNDS_DIR = BASE_DIR / "sounds"
|
||||
LOGS_DIR = BASE_DIR / "logs"
|
||||
|
||||
# Audio-Einstellungen (minimal)
|
||||
SAMPLERATE = 16000
|
||||
CHUNK_SIZE = 512
|
||||
CHANNELS = 1
|
||||
DEVICE_INDEX = None
|
||||
|
||||
# === DEINE 3 KOMMANDOS ===
|
||||
# Format: "Gesprochenes Wort" -> "Sounddatei" und "Aktion"
|
||||
KEYWORDS = {
|
||||
"musik": {
|
||||
"sound": "music.wav",
|
||||
"action": "play_music",
|
||||
"confidence": 0.65, # 65% Sicherheit ausreichend
|
||||
},
|
||||
"stopp": {
|
||||
"sound": "stopped.wav",
|
||||
"action": "stop",
|
||||
"confidence": 0.70,
|
||||
},
|
||||
"licht": {
|
||||
"sound": "light.wav",
|
||||
"action": "toggle_light",
|
||||
"confidence": 0.68,
|
||||
},
|
||||
}
|
||||
|
||||
# Logging
|
||||
LOG_FILE = LOGS_DIR / "keyword_spotting.log"
|
||||
LOG_LEVEL = logging.INFO
|
||||
|
||||
# ============================================================================
|
||||
# LOGGING SETUP
|
||||
# ============================================================================
|
||||
|
||||
def setup_logging():
|
||||
"""Einfaches Logging"""
|
||||
Config.LOGS_DIR.mkdir(exist_ok=True)
|
||||
|
||||
logger = logging.getLogger("KeywordSpotter")
|
||||
logger.setLevel(Config.LOG_LEVEL)
|
||||
|
||||
# File handler
|
||||
fh = logging.FileHandler(Config.LOG_FILE)
|
||||
fh.setLevel(Config.LOG_LEVEL)
|
||||
|
||||
# Console handler
|
||||
ch = logging.StreamHandler()
|
||||
ch.setLevel(Config.LOG_LEVEL)
|
||||
|
||||
# Formatter
|
||||
formatter = logging.Formatter(
|
||||
'%(asctime)s - %(levelname)s - %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
fh.setFormatter(formatter)
|
||||
ch.setFormatter(formatter)
|
||||
|
||||
logger.addHandler(fh)
|
||||
logger.addHandler(ch)
|
||||
|
||||
return logger
|
||||
|
||||
logger = setup_logging()
|
||||
|
||||
# ============================================================================
|
||||
# AUDIO-GERÄTE
|
||||
# ============================================================================
|
||||
|
||||
def find_respeaker_device():
|
||||
"""Finde ReSpeaker-Gerät"""
|
||||
logger.info("Suche ReSpeaker...")
|
||||
try:
|
||||
for index, name in enumerate(sd.query_devices()):
|
||||
if isinstance(name, dict):
|
||||
device_name = name.get('name', '')
|
||||
else:
|
||||
device_name = str(name)
|
||||
|
||||
if 'seeed' in device_name.lower():
|
||||
logger.info(f"✓ ReSpeaker gefunden: Index {index}")
|
||||
return index
|
||||
except:
|
||||
pass
|
||||
|
||||
logger.warning("⚠ ReSpeaker nicht gefunden, nutze Standard-Audio")
|
||||
return None
|
||||
|
||||
# ============================================================================
|
||||
# AKUSTISCHE FINGERPRINTS (Ultra-leicht statt ML-Modell)
|
||||
# ============================================================================
|
||||
|
||||
class AudioFingerprint:
|
||||
"""
|
||||
Erzeugt akustische Fingerprints für Keywords
|
||||
Viel leichter als ML-Modelle - nur ~5MB gesamt!
|
||||
"""
|
||||
|
||||
@staticmethod
|
||||
def extract_features(audio_chunk):
|
||||
"""
|
||||
Extrahiere einfache Audio-Features für Vergleich
|
||||
- Zero Crossing Rate (ZCR)
|
||||
- Energy
|
||||
- Spektrale Centroid
|
||||
- MFCC (vereinfacht)
|
||||
"""
|
||||
audio = np.array(audio_chunk, dtype=np.float32) / 32768.0
|
||||
|
||||
# 1. Zero Crossing Rate (schnelle/langsame Sprache)
|
||||
zcr = np.mean(np.abs(np.diff(np.sign(audio))))
|
||||
|
||||
# 2. Energy (Lautstärke)
|
||||
energy = np.sqrt(np.mean(audio ** 2))
|
||||
|
||||
# 3. Spectral features (sehr vereinfacht)
|
||||
fft = np.abs(np.fft.fft(audio[:512]))
|
||||
freq_energy = [
|
||||
np.sum(fft[0:50]), # Tiefe Frequenzen
|
||||
np.sum(fft[50:150]), # Mittlere Frequenzen
|
||||
np.sum(fft[150:256]), # Hohe Frequenzen
|
||||
]
|
||||
|
||||
return np.array([zcr, energy] + freq_energy, dtype=np.float32)
|
||||
|
||||
@staticmethod
|
||||
def compare_fingerprints(fp1, fp2):
|
||||
"""Vergleiche zwei Fingerprints (0.0 = unterschiedlich, 1.0 = identisch)"""
|
||||
# Normalisiere
|
||||
fp1_norm = (fp1 - np.mean(fp1)) / (np.std(fp1) + 1e-6)
|
||||
fp2_norm = (fp2 - np.mean(fp2)) / (np.std(fp2) + 1e-6)
|
||||
|
||||
# Cosine similarity
|
||||
similarity = np.dot(fp1_norm, fp2_norm) / (
|
||||
np.linalg.norm(fp1_norm) * np.linalg.norm(fp2_norm) + 1e-6
|
||||
)
|
||||
|
||||
# Normalisiere auf [0, 1]
|
||||
similarity = (similarity + 1.0) / 2.0
|
||||
return max(0.0, min(1.0, similarity))
|
||||
|
||||
# ============================================================================
|
||||
# REFERENCE FINGERPRINTS (Training)
|
||||
# ============================================================================
|
||||
|
||||
class ReferenceDatabase:
|
||||
"""
|
||||
Speichert Reference-Fingerprints für deine 3 Kommandos
|
||||
WICHTIG: Diese müssen einmalig trainiert werden!
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.db_file = Config.BASE_DIR / "reference_fingerprints.npy"
|
||||
self.keywords_file = Config.BASE_DIR / "reference_keywords.txt"
|
||||
self.fingerprints = {}
|
||||
self.load_or_create()
|
||||
|
||||
def load_or_create(self):
|
||||
"""Lade existierende oder erstelle neue Referenzen"""
|
||||
if self.db_file.exists() and self.keywords_file.exists():
|
||||
logger.info("Lade existierende Reference-Fingerprints...")
|
||||
try:
|
||||
data = np.load(self.db_file, allow_pickle=True).item()
|
||||
self.fingerprints = data
|
||||
logger.info(f"✓ {len(self.fingerprints)} Keywords geladen")
|
||||
except Exception as e:
|
||||
logger.warning(f"Konnte Fingerprints nicht laden: {e}")
|
||||
self.create_default_fingerprints()
|
||||
else:
|
||||
logger.info("Erstelle Default-Fingerprints...")
|
||||
self.create_default_fingerprints()
|
||||
|
||||
def create_default_fingerprints(self):
|
||||
"""
|
||||
Erstelle vereinfachte Default-Fingerprints
|
||||
In Produktion würdest du diese durch echte Audio-Samples trainieren!
|
||||
"""
|
||||
logger.warning("⚠ WICHTIG: Benutze bin/prepare_training.py für Training!")
|
||||
|
||||
# Vereinfachte Fingerprints als Platzhalter
|
||||
# Später durch echte Samples ersetzen!
|
||||
self.fingerprints = {
|
||||
"musik": np.array([0.05, 0.3, 100, 500, 200], dtype=np.float32),
|
||||
"stopp": np.array([0.02, 0.2, 150, 400, 300], dtype=np.float32),
|
||||
"licht": np.array([0.04, 0.25, 120, 450, 250], dtype=np.float32),
|
||||
}
|
||||
|
||||
self.save()
|
||||
|
||||
def save(self):
|
||||
"""Speichere Fingerprints"""
|
||||
try:
|
||||
np.save(self.db_file, self.fingerprints)
|
||||
logger.info(f"✓ Reference-Fingerprints gespeichert")
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler beim Speichern: {e}")
|
||||
|
||||
def add_training_sample(self, keyword, audio_chunk):
|
||||
"""Füge Trainings-Sample hinzu"""
|
||||
fp = AudioFingerprint.extract_features(audio_chunk)
|
||||
|
||||
if keyword not in self.fingerprints:
|
||||
self.fingerprints[keyword] = fp
|
||||
else:
|
||||
# Durchschnitt mit existierendem
|
||||
self.fingerprints[keyword] = (
|
||||
self.fingerprints[keyword] + fp
|
||||
) / 2.0
|
||||
|
||||
self.save()
|
||||
logger.info(f"✓ Training-Sample hinzugefügt: {keyword}")
|
||||
|
||||
# ============================================================================
|
||||
# KEYWORD SPOTTER
|
||||
# ============================================================================
|
||||
|
||||
class KeywordSpotter:
|
||||
"""Höre nach deinen 3 Keywords"""
|
||||
|
||||
def __init__(self):
|
||||
logger.info("Initialisiere Keyword Spotter...")
|
||||
|
||||
Config.DEVICE_INDEX = find_respeaker_device()
|
||||
self.ref_db = ReferenceDatabase()
|
||||
|
||||
self.stream = None
|
||||
self.is_running = False
|
||||
self.buffer = deque(maxlen=Config.SAMPLERATE) # 1 Sekunde Buffer
|
||||
|
||||
def audio_callback(self, indata, frames, time_info, status):
|
||||
"""Callback beim Audio-Input"""
|
||||
if status:
|
||||
logger.warning(f"Audio-Status: {status}")
|
||||
|
||||
# Füge zu Buffer hinzu
|
||||
audio_data = indata[:, 0]
|
||||
for sample in audio_data:
|
||||
self.buffer.append(int(sample * 32767))
|
||||
|
||||
def start(self):
|
||||
"""Starte Audio-Listening"""
|
||||
logger.info("Starte Audio-Listening...")
|
||||
try:
|
||||
self.stream = sd.InputStream(
|
||||
samplerate=Config.SAMPLERATE,
|
||||
blocksize=Config.CHUNK_SIZE,
|
||||
channels=Config.CHANNELS,
|
||||
device=Config.DEVICE_INDEX,
|
||||
callback=self.audio_callback
|
||||
)
|
||||
self.stream.start()
|
||||
self.is_running = True
|
||||
logger.info("✓ Audio-Listening aktiv")
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler beim Starten: {e}")
|
||||
raise
|
||||
|
||||
def stop(self):
|
||||
"""Stoppe Audio-Listening"""
|
||||
logger.info("Stoppe Audio-Listening...")
|
||||
if self.stream:
|
||||
self.stream.stop()
|
||||
self.stream.close()
|
||||
self.is_running = False
|
||||
|
||||
def detect_keywords(self):
|
||||
"""
|
||||
Erkenne Keywords kontinuierlich
|
||||
Rückgabe: (keyword, confidence) oder (None, 0)
|
||||
"""
|
||||
if len(self.buffer) < Config.SAMPLERATE:
|
||||
return None, 0
|
||||
|
||||
audio_chunk = list(self.buffer)
|
||||
current_fp = AudioFingerprint.extract_features(audio_chunk)
|
||||
|
||||
best_keyword = None
|
||||
best_confidence = 0
|
||||
|
||||
# Vergleiche mit allen Keywords
|
||||
for keyword, threshold_config in Config.KEYWORDS.items():
|
||||
ref_fp = self.ref_db.fingerprints.get(keyword)
|
||||
|
||||
if ref_fp is None:
|
||||
continue
|
||||
|
||||
# Berechne Ähnlichkeit
|
||||
similarity = AudioFingerprint.compare_fingerprints(current_fp, ref_fp)
|
||||
required_threshold = threshold_config.get("confidence", 0.7)
|
||||
|
||||
logger.debug(f"{keyword}: {similarity:.2%} (benötigt: {required_threshold:.0%})")
|
||||
|
||||
# Ist besser als bisherig?
|
||||
if similarity > best_confidence and similarity >= required_threshold:
|
||||
best_keyword = keyword
|
||||
best_confidence = similarity
|
||||
|
||||
return best_keyword, best_confidence
|
||||
|
||||
# ============================================================================
|
||||
# SOUND-AUSGABE
|
||||
# ============================================================================
|
||||
|
||||
class SoundPlayer:
|
||||
"""Spiele Sounds ab"""
|
||||
|
||||
def __init__(self):
|
||||
self.sounds_dir = Config.SOUNDS_DIR
|
||||
self.sounds_dir.mkdir(exist_ok=True)
|
||||
|
||||
def play_sound(self, filename):
|
||||
"""Spiele Sound ab"""
|
||||
sound_path = self.sounds_dir / filename
|
||||
|
||||
if not sound_path.exists():
|
||||
logger.warning(f"⚠ Sound nicht gefunden: {filename}")
|
||||
return False
|
||||
|
||||
try:
|
||||
logger.info(f"♪ Spiele Sound ab: {filename}")
|
||||
subprocess.run(
|
||||
['aplay', '-D', 'hw:1,0', str(sound_path)],
|
||||
check=True,
|
||||
capture_output=True,
|
||||
timeout=10
|
||||
)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"✗ Fehler beim Abspielen: {e}")
|
||||
return False
|
||||
|
||||
# ============================================================================
|
||||
# AKTION-HANDLER
|
||||
# ============================================================================
|
||||
|
||||
class ActionHandler:
|
||||
"""Führe Aktionen aus"""
|
||||
|
||||
def __init__(self, sound_player):
|
||||
self.sound_player = sound_player
|
||||
|
||||
def execute(self, keyword):
|
||||
"""Führe Aktion aus"""
|
||||
if keyword not in Config.KEYWORDS:
|
||||
return False
|
||||
|
||||
config = Config.KEYWORDS[keyword]
|
||||
logger.info(f"🎯 Erkannt: {keyword.upper()}")
|
||||
|
||||
# Spiele Sound ab
|
||||
if config.get("sound"):
|
||||
self.sound_player.play_sound(config["sound"])
|
||||
|
||||
# Führe Aktion aus
|
||||
action = config.get("action")
|
||||
|
||||
if action == "play_music":
|
||||
logger.info("▶ Musik abspielen...")
|
||||
# Hier könnten echte Aktionen folgen
|
||||
elif action == "stop":
|
||||
logger.info("⏹ Stoppen...")
|
||||
elif action == "toggle_light":
|
||||
logger.info("💡 Licht umschalten...")
|
||||
# GPIO-Beispiel: GPIO.output(17, not GPIO.input(17))
|
||||
|
||||
return True
|
||||
|
||||
# ============================================================================
|
||||
# HAUPTPROGRAMM
|
||||
# ============================================================================
|
||||
|
||||
class VoiceControllerLite:
|
||||
"""Hauptprogramm - Ultra-leicht und schnell"""
|
||||
|
||||
def __init__(self):
|
||||
logger.info("=" * 70)
|
||||
logger.info("Voice Controller (Lite) für Pi Zero 2W")
|
||||
logger.info("Keyword Spotting - Nur 3 Kommandos, super schnell!")
|
||||
logger.info("=" * 70)
|
||||
|
||||
try:
|
||||
self.spotter = KeywordSpotter()
|
||||
self.sound_player = SoundPlayer()
|
||||
self.action_handler = ActionHandler(self.sound_player)
|
||||
|
||||
self.last_detection = 0
|
||||
self.detection_cooldown = 1.0 # 1 Sekunde zwischen Erkennungen
|
||||
except Exception as e:
|
||||
logger.error(f"✗ Initialisierungsfehler: {e}")
|
||||
raise
|
||||
|
||||
def run(self):
|
||||
"""Hauptschleife"""
|
||||
logger.info("Starte Hauptschleife...")
|
||||
|
||||
try:
|
||||
self.spotter.start()
|
||||
|
||||
detection_count = 0
|
||||
|
||||
while True:
|
||||
try:
|
||||
keyword, confidence = self.spotter.detect_keywords()
|
||||
|
||||
if keyword and confidence > 0.5:
|
||||
# Cooldown prüfen (verhindert Mehrfacherkennung)
|
||||
current_time = time.time()
|
||||
if current_time - self.last_detection > self.detection_cooldown:
|
||||
detection_count += 1
|
||||
logger.info(
|
||||
f"[#{detection_count}] ✓ {keyword.upper()} "
|
||||
f"({confidence:.1%})"
|
||||
)
|
||||
|
||||
# Führe Aktion aus
|
||||
self.action_handler.execute(keyword)
|
||||
self.last_detection = current_time
|
||||
|
||||
# Kurze Pause (nicht 100% CPU)
|
||||
time.sleep(0.1)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
logger.info("\n⏹ Unterbrochen durch Benutzer")
|
||||
break
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler in Schleife: {e}")
|
||||
time.sleep(1)
|
||||
|
||||
finally:
|
||||
self.spotter.stop()
|
||||
logger.info("✓ Voice Controller beendet")
|
||||
|
||||
# ============================================================================
|
||||
# EINSTIEGSPUNKT
|
||||
# ============================================================================
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
controller = VoiceControllerLite()
|
||||
controller.run()
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Beendet")
|
||||
sys.exit(0)
|
||||
except Exception as e:
|
||||
logger.error(f"✗ Kritischer Fehler: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
```
|
||||
|
||||
Speichere die Datei (Ctrl+X, Y, Enter).
|
||||
|
||||
### 5.2 Training-Skript erstellen
|
||||
|
||||
Erstelle `~/voice_assistant/prepare_training.py` um die Keywords zu trainieren:
|
||||
|
||||
```bash
|
||||
nano ~/voice_assistant/prepare_training.py
|
||||
```
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
Training-Skript: Nimm Audio-Samples deiner 3 Keywords auf
|
||||
Dies muss einmalig am Anfang durchgeführt werden!
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import logging
|
||||
import sounddevice as sd
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from keyword_spotting import (
|
||||
Config, setup_logging, find_respeaker_device,
|
||||
ReferenceDatabase, AudioFingerprint
|
||||
)
|
||||
|
||||
logger = setup_logging()
|
||||
|
||||
def record_keyword_sample(keyword, duration=2.0):
|
||||
"""
|
||||
Nimme Audio-Sample auf
|
||||
Dauer: 2 Sekunden
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Recording: '{keyword}'")
|
||||
print(f"{'='*60}")
|
||||
print(f"⏺ Aufnahme in 3 Sekunden... (Drücke SPACE zur Bereitschaft)")
|
||||
input("Drücke ENTER, wenn bereit >")
|
||||
|
||||
Config.DEVICE_INDEX = find_respeaker_device()
|
||||
|
||||
print(f"🔴 Aufnahme läuft... ({duration}s)")
|
||||
|
||||
# Aufnahme
|
||||
audio = sd.rec(
|
||||
int(Config.SAMPLERATE * duration),
|
||||
samplerate=Config.SAMPLERATE,
|
||||
channels=1,
|
||||
device=Config.DEVICE_INDEX,
|
||||
dtype='int16'
|
||||
)
|
||||
|
||||
sd.wait()
|
||||
|
||||
print("✓ Aufnahme abgeschlossen")
|
||||
|
||||
return audio[:, 0] if audio.ndim > 1 else audio
|
||||
|
||||
def train_keyword(keyword, num_samples=3):
|
||||
"""
|
||||
Trainiere Keyword mit mehreren Samples
|
||||
Empfohlen: 3-5 Samples pro Keyword
|
||||
"""
|
||||
logger.info(f"\n{'='*60}")
|
||||
logger.info(f"Training: {keyword.upper()}")
|
||||
logger.info(f"{'='*60}")
|
||||
logger.info(f"Bitte nimm {num_samples} Samples des Keywords '{keyword}' auf")
|
||||
|
||||
db = ReferenceDatabase()
|
||||
fingerprints = []
|
||||
|
||||
for i in range(num_samples):
|
||||
print(f"\n[Sample {i+1}/{num_samples}] '{keyword}'")
|
||||
audio = record_keyword_sample(keyword, duration=2.0)
|
||||
|
||||
# Extrahiere Fingerprint
|
||||
fp = AudioFingerprint.extract_features(audio)
|
||||
fingerprints.append(fp)
|
||||
|
||||
print(f"✓ Fingerprint extrahiert: {fp}")
|
||||
|
||||
# Durchschnitt aller Samples
|
||||
avg_fingerprint = np.mean(fingerprints, axis=0)
|
||||
db.fingerprints[keyword] = avg_fingerprint
|
||||
db.save()
|
||||
|
||||
logger.info(f"✓ {keyword} trainiert und gespeichert!")
|
||||
return True
|
||||
|
||||
def main():
|
||||
"""Haupttraining"""
|
||||
print("\n" + "="*60)
|
||||
print("KEYWORD SPOTTING - TRAINING")
|
||||
print("="*60)
|
||||
print("\nAufnehmen von Sprachsamples für deine 3 Keywords:")
|
||||
print("1. musik")
|
||||
print("2. stopp")
|
||||
print("3. licht")
|
||||
print("\nFür jeden Keyword werden 3 Samples benötigt.")
|
||||
print("Sprich das Keyword klar und deutlich ins Mikrofon.")
|
||||
print("\n" + "="*60 + "\n")
|
||||
|
||||
input("Drücke ENTER um zu starten >")
|
||||
|
||||
try:
|
||||
for keyword in Config.KEYWORDS.keys():
|
||||
train_keyword(keyword, num_samples=3)
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("✓ TRAINING ABGESCHLOSSEN!")
|
||||
print("="*60)
|
||||
print("\nRun jetzt: python3 keyword_spotting.py")
|
||||
|
||||
except KeyboardInterrupt:
|
||||
logger.info("\n✗ Training abgebrochen")
|
||||
sys.exit(0)
|
||||
except Exception as e:
|
||||
logger.error(f"✗ Fehler: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
Speichere die Datei.
|
||||
|
||||
### 5.3 Sound-Dateien erstellen
|
||||
|
||||
```bash
|
||||
python3 << 'EOF'
|
||||
import wave
|
||||
import math
|
||||
import sys
|
||||
|
||||
def generate_tone(frequency, duration, sample_rate=16000):
|
||||
samples = []
|
||||
for i in range(int(sample_rate * duration)):
|
||||
sample = int(32767 * 0.3 * math.sin(2 * math.pi * frequency * i / sample_rate))
|
||||
samples.append(sample)
|
||||
return samples
|
||||
from vosk import Model, KaldiRecognizer, SetLogLevel
|
||||
|
||||
# Music
|
||||
sounds = generate_tone(523, 0.15) + generate_tone(587, 0.15) + generate_tone(659, 0.15)
|
||||
with wave.open('/home/pi/voice_assistant/sounds/music.wav', 'wb') as f:
|
||||
f.setnchannels(1)
|
||||
f.setsampwidth(2)
|
||||
f.setframerate(16000)
|
||||
f.writeframes(b''.join(s.to_bytes(2, 'little', signed=True) for s in sounds))
|
||||
print("✓ music.wav")
|
||||
# You can set log level to -1 to disable debug messages
|
||||
SetLogLevel(0)
|
||||
|
||||
# Stopped
|
||||
sounds = generate_tone(440, 0.3)
|
||||
with wave.open('/home/pi/voice_assistant/sounds/stopped.wav', 'wb') as f:
|
||||
f.setnchannels(1)
|
||||
f.setsampwidth(2)
|
||||
f.setframerate(16000)
|
||||
f.writeframes(b''.join(s.to_bytes(2, 'little', signed=True) for s in sounds))
|
||||
print("✓ stopped.wav")
|
||||
wf = wave.open(sys.argv[1], "rb")
|
||||
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
|
||||
print("Audio file must be WAV format mono PCM.")
|
||||
sys.exit(1)
|
||||
|
||||
# Light
|
||||
sounds = generate_tone(587, 0.2) + generate_tone(659, 0.1)
|
||||
with wave.open('/home/pi/voice_assistant/sounds/light.wav', 'wb') as f:
|
||||
f.setnchannels(1)
|
||||
f.setsampwidth(2)
|
||||
f.setframerate(16000)
|
||||
f.writeframes(b''.join(s.to_bytes(2, 'little', signed=True) for s in sounds))
|
||||
print("✓ light.wav")
|
||||
EOF
|
||||
```
|
||||
model = Model("model") #lang="en-us")
|
||||
|
||||
### 5.4 TRAINING durchführen (WICHTIG!)
|
||||
# You can also init model by name or with a folder path
|
||||
# model = Model(model_name="vosk-model-en-us-0.21")
|
||||
# model = Model("models/en")
|
||||
|
||||
```bash
|
||||
cd ~/voice_assistant
|
||||
chmod +x prepare_training.py
|
||||
python3 prepare_training.py
|
||||
```
|
||||
rec = KaldiRecognizer(model, wf.getframerate())
|
||||
rec.SetWords(True)
|
||||
rec.SetPartialWords(True)
|
||||
|
||||
**Das Trainings-Skript wird dich auffordern:**
|
||||
1. Sprich 3x das Wort "musik"
|
||||
2. Sprich 3x das Wort "stopp"
|
||||
3. Sprich 3x das Wort "licht"
|
||||
while True:
|
||||
data = wf.readframes(4000)
|
||||
if len(data) == 0:
|
||||
break
|
||||
if rec.AcceptWaveform(data):
|
||||
print(rec.Result())
|
||||
else:
|
||||
print(rec.PartialResult())
|
||||
|
||||
Jedes Sample dauert 2 Sekunden. Die Fingerprints werden automatisch gespeichert.
|
||||
print(rec.FinalResult())
|
||||
|
||||
**Dauer:** ~5 Minuten
|
||||
|
||||
### 5.5 Test
|
||||
|
||||
Nach dem Training:
|
||||
|
||||
```bash
|
||||
python3 ~/voice_assistant/keyword_spotting.py
|
||||
```
|
||||
|
||||
Jetzt:
|
||||
1. Sprich: "musik" → Sound abspielen
|
||||
2. Sprich: "stopp" → Sound abspielen
|
||||
3. Sprich: "licht" → Sound abspielen
|
||||
|
||||
Beende mit Ctrl+C.
|
||||
|
||||
---
|
||||
|
||||
## TEIL 6: Systemctl Service (wie vorher)
|
||||
|
||||
```bash
|
||||
sudo nano /etc/systemd/system/voice-assistant.service
|
||||
```
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Voice Assistant - Keyword Spotting
|
||||
After=network.target sound.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=pi
|
||||
WorkingDirectory=/home/pi/voice_assistant
|
||||
ExecStart=/usr/bin/python3 /home/pi/voice_assistant/keyword_spotting.py
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
|
||||
# Ressourcen-Limits
|
||||
MemoryMax=128M
|
||||
CPUQuota=30%
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable voice-assistant.service
|
||||
sudo systemctl start voice-assistant.service
|
||||
sudo systemctl status voice-assistant.service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PERFORMANCE-VERGLEICH
|
||||
|
||||
### Speichernutzung:
|
||||
|
||||
```bash
|
||||
# Vor (Vosk)
|
||||
du -sh ~/voice_models/
|
||||
# Ausgabe: ~100MB
|
||||
|
||||
# Nach (Keyword Spotting)
|
||||
du -sh ~/voice_assistant/
|
||||
# Ausgabe: ~2MB (!!)
|
||||
```
|
||||
|
||||
### RAM während Betrieb:
|
||||
|
||||
```bash
|
||||
ps aux | grep python3 | grep keyword
|
||||
# Vosk: ~100-120MB
|
||||
# Keyword Spotting: ~25-35MB
|
||||
```
|
||||
|
||||
### CPU-Last:
|
||||
|
||||
```bash
|
||||
# top
|
||||
# Vosk: 40-60% (Pi Zero 2W läuft fast warm!)
|
||||
# Keyword Spotting: 5-15% (gemütlich!)
|
||||
```
|
||||
|
||||
### Startup-Zeit:
|
||||
|
||||
```bash
|
||||
time python3 keyword_spotting.py
|
||||
# Vosk: real 0m3.5s
|
||||
# Keyword Spotting: real 0m0.4s (!!)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## RESSOURCEN-VERGLEICH (Zusammenfassung)
|
||||
|
||||
| Metrik | Vosk | Keyword Spotting | Einsparung |
|
||||
|--------|------|------------------|------------|
|
||||
| **Modellgröße** | 50-100MB | < 1MB | 99%! |
|
||||
| **RAM-Nutzung** | 100-120MB | 25-35MB | 75% |
|
||||
| **CPU-Last (Pi Zero 2W)** | 40-60% | 5-15% | 75% |
|
||||
| **Startup-Zeit** | 3-5s | 0.4s | 90% |
|
||||
| **Erkennungslatenz** | 200-500ms | 50-100ms | 75% |
|
||||
| **Genauigkeit (3 Befehle)** | 85-92% | 93-98% | +10% |
|
||||
| **Speicherplatz (gesamt)** | ~150MB | ~30MB | 80% |
|
||||
|
||||
**Fazit:** Du sparst massiv Ressourcen bei besserer Performance!
|
||||
|
||||
---
|
||||
|
||||
## TROUBLESHOOTING
|
||||
|
||||
**Problem: "Erkennung funktioniert nicht nach Training"**
|
||||
|
||||
```bash
|
||||
# Überprüfe ob Fingerprints gespeichert wurden
|
||||
ls -la ~/voice_assistant/*.npy
|
||||
|
||||
# Zeige gespeicherte Keywords
|
||||
cat ~/voice_assistant/reference_keywords.txt
|
||||
```
|
||||
|
||||
**Problem: "False Positives (erkennt Worte, die nicht gesprochen wurden)"**
|
||||
|
||||
Erhöhe die Confidence-Schwelle in `keyword_spotting.py`:
|
||||
|
||||
```python
|
||||
"musik": {
|
||||
"confidence": 0.75, # Vorher: 0.65
|
||||
}
|
||||
```
|
||||
|
||||
**Problem: "Erkennung zu ungenau"**
|
||||
|
||||
Trainiere erneut mit besserer Aussprache:
|
||||
|
||||
```bash
|
||||
python3 prepare_training.py
|
||||
```
|
||||
|
||||
Sprich die Keywords deutlicher und lauter.
|
||||
|
||||
---
|
||||
|
||||
## NÄCHSTE SCHRITTE
|
||||
|
||||
Mit dieser Lösung kannst du:
|
||||
|
||||
1. ✅ **3 Keywords erkennen** mit 93-98% Genauigkeit
|
||||
2. ✅ **Super schnell starten** (< 1 Sekunde)
|
||||
3. ✅ **Speicher sparen** (80% weniger!)
|
||||
4. ✅ **CPU sparen** (75% weniger Last)
|
||||
5. ✅ **Offline arbeiten** (kein Internet nötig)
|
||||
|
||||
Wenn du später **mehr Kommandos** brauchst:
|
||||
- 5 Kommandos: Noch OK mit dieser Methode
|
||||
- 10+ Kommandos: Wechsel zu leichtem ML-Modell (TensorFlow Lite)
|
||||
- Beliebige Sprache: Dann Vosk nötig
|
||||
|
||||
---
|
||||
|
||||
## FRAGEN & ANTWORTEN
|
||||
|
||||
**F: Kann ich mehr als 3 Kommandos hinzufügen?**
|
||||
A: Ja, bis ca. 10 Kommandos bleibt die Methode effizient. Mehr als 10 → TensorFlow Lite ML-Modell nutzen.
|
||||
|
||||
**F: Wie lange dauert Training?**
|
||||
A: ~5 Minuten (3 Samples × 3 Keywords × 2 Sekunden + Verarbeitung)
|
||||
|
||||
**F: Muss ich jedes Mal neu trainieren?**
|
||||
A: Nein, die Fingerprints werden gespeichert. Nur am Anfang nötig.
|
||||
|
||||
**F: Funktioniert es auch mit Dialekt/Akzent?**
|
||||
A: Ja! Trainiere mit DEINEM Akzent, dann erkannt der System dich perfekt.
|
||||
|
||||
**F: Was ist wenn jemand anders spricht?**
|
||||
A: Die Erkennung wird dann weniger genau (ca. 10-20% weniger). Das ist normal - trainiere ggf. mit mehreren Stimmen.
|
||||
|
||||
---
|
||||
|
||||
**Viel Erfolg mit deinem schlanken Voice Control System! 🎉**
|
||||
|
||||
Die Lösung ist optimiert, super schnell und perfekt für Pi Zero 2W!
|
||||
|
||||
Reference in New Issue
Block a user