Next.jsとOpenAIで作る実用的AIボイスチャットアプリ

はじめに

音声でAIと対話できるアプリケーションは、もはやSFの世界の話ではありません。今回は、Next.js、OpenAI API、そしてWeb Speech APIを組み合わせて、実用的なAIボイスチャットアプリを実装する方法を詳しく解説します。

このアプリケーションには以下の機能を実装します：

🎤 リアルタイム音声認識
🤖 GPT-4による自然な応答生成
🔊 音声合成による読み上げ
💬 会話履歴の保存と表示
🌏 多言語対応
🎨 カスタマイズ可能なAIペルソナ

技術スタック

フロントエンド: Next.js 14 (App Router)
スタイリング: Tailwind CSS
AI: OpenAI GPT-4
音声認識: Web Speech API
音声合成: Web Speech API + ElevenLabs (オプション)
状態管理: Zustand
データ保存: LocalStorage

プロジェクトのセットアップ

1. Next.jsプロジェクトの作成

npx create-next-app@latest ai-voice-chat --typescript --tailwind --app
cd ai-voice-chat

2. 必要な依存関係のインストール

npm install openai zustand react-icons axios
npm install --save-dev @types/node

3. 環境変数の設定

# .env.local
OPENAI_API_KEY=your-openai-api-key-here
# オプション：ElevenLabs使用時
ELEVENLABS_API_KEY=your-elevenlabs-api-key-here

完全な実装コード

1. 型定義

// types/chat.ts
export interface Message {
  id: string;
  role: 'user' | 'assistant' | 'system';
  content: string;
  timestamp: Date;
  audioUrl?: string;
}

export interface ChatSession {
  id: string;
  title: string;
  messages: Message[];
  createdAt: Date;
  updatedAt: Date;
  persona: AIPersona;
}

export interface AIPersona {
  id: string;
  name: string;
  description: string;
  systemPrompt: string;
  voice: {
    lang: string;
    rate: number;
    pitch: number;
    voiceURI?: string;
  };
}

export interface VoiceSettings {
  language: string;
  recognitionLang: string;
  synthesisVoice: string;
  autoSpeak: boolean;
  continuous: boolean;
}

2. カスタムフック：音声認識

// hooks/useSpeechRecognition.ts
import { useEffect, useRef, useState, useCallback } from 'react';

interface UseSpeechRecognitionProps {
  continuous?: boolean;
  language?: string;
  onResult?: (transcript: string) => void;
  onError?: (error: string) => void;
}

export const useSpeechRecognition = ({
  continuous = false,
  language = 'ja-JP',
  onResult,
  onError,
}: UseSpeechRecognitionProps) => {
  const [isListening, setIsListening] = useState(false);
  const [transcript, setTranscript] = useState('');
  const [interimTranscript, setInterimTranscript] = useState('');
  const recognitionRef = useRef<any>(null);

  useEffect(() => {
    if (!('webkitSpeechRecognition' in window) && !('SpeechRecognition' in window)) {
      onError?.('お使いのブラウザは音声認識をサポートしていません');
      return;
    }

    const SpeechRecognition = (window as any).SpeechRecognition || (window as any).webkitSpeechRecognition;
    const recognition = new SpeechRecognition();

    recognition.continuous = continuous;
    recognition.interimResults = true;
    recognition.lang = language;

    recognition.onstart = () => {
      setIsListening(true);
    };

    recognition.onresult = (event: any) => {
      let interimText = '';
      let finalText = '';

      for (let i = event.resultIndex; i < event.results.length; i++) {
        const transcript = event.results[i][0].transcript;
        if (event.results[i].isFinal) {
          finalText += transcript + ' ';
        } else {
          interimText += transcript;
        }
      }

      if (finalText) {
        setTranscript(prev => prev + finalText);
        onResult?.(finalText.trim());
      }
      setInterimTranscript(interimText);
    };

    recognition.onerror = (event: any) => {
      console.error('Speech recognition error:', event.error);
      setIsListening(false);
      
      let errorMessage = '音声認識エラーが発生しました';
      switch (event.error) {
        case 'no-speech':
          errorMessage = '音声が検出されませんでした';
          break;
        case 'network':
          errorMessage = 'ネットワークエラーが発生しました';
          break;
        case 'not-allowed':
          errorMessage = 'マイクへのアクセスが許可されていません';
          break;
      }
      onError?.(errorMessage);
    };

    recognition.onend = () => {
      setIsListening(false);
      if (continuous && isListening) {
        recognition.start();
      }
    };

    recognitionRef.current = recognition;

    return () => {
      if (recognitionRef.current) {
        recognitionRef.current.stop();
      }
    };
  }, [continuous, language, onResult, onError]);

  const startListening = useCallback(() => {
    if (recognitionRef.current && !isListening) {
      setTranscript('');
      setInterimTranscript('');
      recognitionRef.current.start();
    }
  }, [isListening]);

  const stopListening = useCallback(() => {
    if (recognitionRef.current && isListening) {
      recognitionRef.current.stop();
    }
  }, [isListening]);

  const toggleListening = useCallback(() => {
    if (isListening) {
      stopListening();
    } else {
      startListening();
    }
  }, [isListening, startListening, stopListening]);

  return {
    isListening,
    transcript,
    interimTranscript,
    startListening,
    stopListening,
    toggleListening,
  };
};

3. カスタムフック：音声合成

// hooks/useSpeechSynthesis.ts
import { useCallback, useEffect, useRef, useState } from 'react';

interface UseSpeechSynthesisProps {
  voice?: string;
  rate?: number;
  pitch?: number;
  volume?: number;
  language?: string;
}

export const useSpeechSynthesis = ({
  voice,
  rate = 1,
  pitch = 1,
  volume = 1,
  language = 'ja-JP',
}: UseSpeechSynthesisProps = {}) => {
  const [speaking, setSpeaking] = useState(false);
  const [voices, setVoices] = useState<SpeechSynthesisVoice[]>([]);
  const utteranceRef = useRef<SpeechSynthesisUtterance | null>(null);

  useEffect(() => {
    const loadVoices = () => {
      const availableVoices = speechSynthesis.getVoices();
      setVoices(availableVoices);
    };

    loadVoices();
    if (speechSynthesis.onvoiceschanged !== undefined) {
      speechSynthesis.onvoiceschanged = loadVoices;
    }

    return () => {
      if (speaking) {
        speechSynthesis.cancel();
      }
    };
  }, []);

  const speak = useCallback((text: string, options?: {
    onEnd?: () => void;
    onError?: (error: any) => void;
  }) => {
    speechSynthesis.cancel();

    const utterance = new SpeechSynthesisUtterance(text);
    utterance.lang = language;
    utterance.rate = rate;
    utterance.pitch = pitch;
    utterance.volume = volume;

    if (voice) {
      const selectedVoice = voices.find(v => v.name === voice || v.voiceURI === voice);
      if (selectedVoice) {
        utterance.voice = selectedVoice;
      }
    } else {
      // デフォルトで言語に合った音声を選択
      const defaultVoice = voices.find(v => v.lang.startsWith(language.split('-')[0]));
      if (defaultVoice) {
        utterance.voice = defaultVoice;
      }
    }

    utterance.onstart = () => setSpeaking(true);
    utterance.onend = () => {
      setSpeaking(false);
      options?.onEnd?.();
    };
    utterance.onerror = (event) => {
      setSpeaking(false);
      console.error('Speech synthesis error:', event);
      options?.onError?.(event);
    };

    utteranceRef.current = utterance;
    speechSynthesis.speak(utterance);
  }, [voices, voice, rate, pitch, volume, language]);

  const pause = useCallback(() => {
    speechSynthesis.pause();
  }, []);

  const resume = useCallback(() => {
    speechSynthesis.resume();
  }, []);

  const cancel = useCallback(() => {
    speechSynthesis.cancel();
    setSpeaking(false);
  }, []);

  return {
    speak,
    pause,
    resume,
    cancel,
    speaking,
    voices,
  };
};

4. Zustandストア

// store/chatStore.ts
import { create } from 'zustand';
import { persist } from 'zustand/middleware';
import { ChatSession, Message, AIPersona, VoiceSettings } from '@/types/chat';

interface ChatStore {
  sessions: ChatSession[];
  currentSessionId: string | null;
  voiceSettings: VoiceSettings;
  personas: AIPersona[];
  currentPersonaId: string;
  
  // Actions
  createSession: (persona?: AIPersona) => string;
  addMessage: (sessionId: string, message: Omit<Message, 'id' | 'timestamp'>) => void;
  setCurrentSession: (sessionId: string) => void;
  deleteSession: (sessionId: string) => void;
  updateVoiceSettings: (settings: Partial<VoiceSettings>) => void;
  setCurrentPersona: (personaId: string) => void;
  getCurrentSession: () => ChatSession | null;
}

// デフォルトのAIペルソナ
const defaultPersonas: AIPersona[] = [
  {
    id: 'friendly-assistant',
    name: 'フレンドリーアシスタント',
    description: '親しみやすく丁寧な対応をするAIアシスタント',
    systemPrompt: 'あなたは親しみやすく、丁寧で役立つAIアシスタントです。ユーザーの質問に対して分かりやすく、温かみのある回答を心がけてください。',
    voice: {
      lang: 'ja-JP',
      rate: 1.0,
      pitch: 1.0,
    },
  },
  {
    id: 'professional',
    name: 'プロフェッショナル',
    description: 'ビジネスシーンに適した専門的なAIアシスタント',
    systemPrompt: 'あなたはプロフェッショナルなビジネスアシスタントです。正確で簡潔な情報提供を心がけ、専門的な観点から回答してください。',
    voice: {
      lang: 'ja-JP',
      rate: 0.95,
      pitch: 0.9,
    },
  },
  {
    id: 'creative',
    name: 'クリエイティブパートナー',
    description: '創造的なアイデアを提供するAIアシスタント',
    systemPrompt: 'あなたは創造的で革新的なアイデアを提供するクリエイティブパートナーです。独創的な視点から、インスピレーションを与える回答を心がけてください。',
    voice: {
      lang: 'ja-JP',
      rate: 1.1,
      pitch: 1.1,
    },
  },
];

export const useChatStore = create<ChatStore>()(
  persist(
    (set, get) => ({
      sessions: [],
      currentSessionId: null,
      voiceSettings: {
        language: 'ja-JP',
        recognitionLang: 'ja-JP',
        synthesisVoice: '',
        autoSpeak: true,
        continuous: false,
      },
      personas: defaultPersonas,
      currentPersonaId: 'friendly-assistant',

      createSession: (persona) => {
        const newSession: ChatSession = {
          id: Date.now().toString(),
          title: '新しい会話',
          messages: [],
          createdAt: new Date(),
          updatedAt: new Date(),
          persona: persona || defaultPersonas[0],
        };
        
        set((state) => ({
          sessions: [newSession, ...state.sessions],
          currentSessionId: newSession.id,
        }));
        
        return newSession.id;
      },

      addMessage: (sessionId, message) => {
        const newMessage: Message = {
          ...message,
          id: Date.now().toString(),
          timestamp: new Date(),
        };
        
        set((state) => ({
          sessions: state.sessions.map((session) =>
            session.id === sessionId
              ? {
                  ...session,
                  messages: [...session.messages, newMessage],
                  updatedAt: new Date(),
                  title: session.messages.length === 0 && message.role === 'user' 
                    ? message.content.slice(0, 30) + '...' 
                    : session.title,
                }
              : session
          ),
        }));
      },

      setCurrentSession: (sessionId) => set({ currentSessionId: sessionId }),

      deleteSession: (sessionId) => {
        set((state) => ({
          sessions: state.sessions.filter((s) => s.id !== sessionId),
          currentSessionId: state.currentSessionId === sessionId 
            ? state.sessions[0]?.id || null 
            : state.currentSessionId,
        }));
      },

      updateVoiceSettings: (settings) => {
        set((state) => ({
          voiceSettings: { ...state.voiceSettings, ...settings },
        }));
      },

      setCurrentPersona: (personaId) => set({ currentPersonaId: personaId }),

      getCurrentSession: () => {
        const state = get();
        return state.sessions.find((s) => s.id === state.currentSessionId) || null;
      },
    }),
    {
      name: 'chat-storage',
    }
  )
);

5. OpenAI APIルート

// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function POST(request: NextRequest) {
  try {
    const { messages, systemPrompt, stream = false } = await request.json();

    const systemMessage = {
      role: 'system' as const,
      content: systemPrompt || 'あなたは親切で役立つAIアシスタントです。',
    };

    const completion = await openai.chat.completions.create({
      model: 'gpt-4-turbo-preview',
      messages: [systemMessage, ...messages],
      temperature: 0.7,
      max_tokens: 1000,
      stream,
    });

    if (stream) {
      // ストリーミングレスポンスの処理
      const encoder = new TextEncoder();
      const stream = new ReadableStream({
        async start(controller) {
          for await (const chunk of completion as any) {
            const text = chunk.choices[0]?.delta?.content || '';
            controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
          }
          controller.enqueue(encoder.encode('data: [DONE]\n\n'));
          controller.close();
        },
      });

      return new NextResponse(stream, {
        headers: {
          'Content-Type': 'text/event-stream',
          'Cache-Control': 'no-cache',
          'Connection': 'keep-alive',
        },
      });
    } else {
      const response = completion.choices[0].message.content;
      return NextResponse.json({ response });
    }
  } catch (error) {
    console.error('Chat API error:', error);
    return NextResponse.json(
      { error: 'チャット処理中にエラーが発生しました' },
      { status: 500 }
    );
  }
}

6. メインのチャットコンポーネント

// components/VoiceChat.tsx
'use client';

import React, { useState, useEffect, useRef, useCallback } from 'react';
import { FaMicrophone, FaMicrophoneSlash, FaVolumeUp, FaStop, FaPaperPlane, FaRedo } from 'react-icons/fa';
import { useSpeechRecognition } from '@/hooks/useSpeechRecognition';
import { useSpeechSynthesis } from '@/hooks/useSpeechSynthesis';
import { useChatStore } from '@/store/chatStore';
import { Message } from '@/types/chat';

export const VoiceChat: React.FC = () => {
  const {
    sessions,
    currentSessionId,
    voiceSettings,
    personas,
    currentPersonaId,
    createSession,
    addMessage,
    getCurrentSession,
  } = useChatStore();

  const [inputText, setInputText] = useState('');
  const [isProcessing, setIsProcessing] = useState(false);
  const messagesEndRef = useRef<HTMLDivElement>(null);
  const currentSession = getCurrentSession();
  const currentPersona = personas.find(p => p.id === currentPersonaId);

  const {
    isListening,
    transcript,
    interimTranscript,
    toggleListening,
    stopListening,
  } = useSpeechRecognition({
    continuous: voiceSettings.continuous,
    language: voiceSettings.recognitionLang,
    onResult: (text) => {
      if (!voiceSettings.continuous) {
        handleSendMessage(text);
      }
    },
    onError: (error) => {
      console.error('Speech recognition error:', error);
    },
  });

  const { speak, cancel, speaking, voices } = useSpeechSynthesis({
    language: voiceSettings.language,
    voice: voiceSettings.synthesisVoice,
    rate: currentPersona?.voice.rate || 1,
    pitch: currentPersona?.voice.pitch || 1,
  });

  useEffect(() => {
    if (!currentSessionId && sessions.length === 0) {
      createSession(currentPersona);
    }
  }, []);

  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [currentSession?.messages]);

  useEffect(() => {
    if (transcript && voiceSettings.continuous) {
      setInputText(transcript);
    }
  }, [transcript, voiceSettings.continuous]);

  const handleSendMessage = async (text?: string) => {
    const messageText = text || inputText.trim();
    if (!messageText || isProcessing || !currentSessionId) return;

    // ユーザーメッセージを追加
    addMessage(currentSessionId, {
      role: 'user',
      content: messageText,
    });

    setInputText('');
    setIsProcessing(true);
    stopListening();

    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          messages: [
            ...currentSession!.messages.map(m => ({
              role: m.role,
              content: m.content,
            })),
            { role: 'user', content: messageText },
          ],
          systemPrompt: currentPersona?.systemPrompt,
        }),
      });

      if (!response.ok) throw new Error('API request failed');

      const data = await response.json();
      const assistantMessage = data.response;

      // アシスタントメッセージを追加
      addMessage(currentSessionId, {
        role: 'assistant',
        content: assistantMessage,
      });

      // 自動読み上げ
      if (voiceSettings.autoSpeak) {
        speak(assistantMessage);
      }
    } catch (error) {
      console.error('Error sending message:', error);
      addMessage(currentSessionId, {
        role: 'assistant',
        content: 'エラーが発生しました。もう一度お試しください。',
      });
    } finally {
      setIsProcessing(false);
    }
  };

  const handleToggleListening = () => {
    if (isListening && voiceSettings.continuous && transcript) {
      handleSendMessage(transcript);
    } else {
      toggleListening();
    }
  };

  const handleNewChat = () => {
    createSession(currentPersona);
  };

  return (
    <div className="flex flex-col h-screen bg-gray-50">
      {/* ヘッダー */}
      <header className="bg-white shadow-sm border-b">
        <div className="max-w-4xl mx-auto px-4 py-4 flex items-center justify-between">
          <div>
            <h1 className="text-2xl font-bold text-gray-800">AI Voice Chat</h1>
            <p className="text-sm text-gray-600">{currentPersona?.name}</p>
          </div>
          <button
            onClick={handleNewChat}
            className="flex items-center gap-2 px-4 py-2 bg-blue-500 text-white rounded-lg hover:bg-blue-600 transition-colors"
          >
            <FaRedo />
            新しい会話
          </button>
        </div>
      </header>

      {/* メッセージエリア */}
      <div className="flex-1 overflow-y-auto">
        <div className="max-w-4xl mx-auto px-4 py-6">
          {currentSession?.messages.map((message) => (
            <MessageBubble key={message.id} message={message} />
          ))}
          {isProcessing && (
            <div className="flex justify-start mb-4">
              <div className="bg-gray-300 rounded-lg px-4 py-2">
                <div className="flex space-x-2">
                  <div className="w-2 h-2 bg-gray-600 rounded-full animate-bounce" style={{ animationDelay: '0ms' }}></div>
                  <div className="w-2 h-2 bg-gray-600 rounded-full animate-bounce" style={{ animationDelay: '150ms' }}></div>
                  <div className="w-2 h-2 bg-gray-600 rounded-full animate-bounce" style={{ animationDelay: '300ms' }}></div>
                </div>
              </div>
            </div>
          )}
          <div ref={messagesEndRef} />
        </div>
      </div>

      {/* 入力エリア */}
      <div className="bg-white border-t">
        <div className="max-w-4xl mx-auto px-4 py-4">
          {/* 音声認識のリアルタイム表示 */}
          {isListening && (
            <div className="mb-2 text-sm text-gray-600">
              {interimTranscript && (
                <p className="italic">認識中: {interimTranscript}</p>
              )}
            </div>
          )}
          
          <div className="flex items-center gap-4">
            <button
              onClick={handleToggleListening}
              className={`p-4 rounded-full transition-all ${
                isListening
                  ? 'bg-red-500 hover:bg-red-600 animate-pulse'
                  : 'bg-blue-500 hover:bg-blue-600'
              } text-white`}
              disabled={isProcessing}
            >
              {isListening ? <FaMicrophoneSlash size={24} /> : <FaMicrophone size={24} />}
            </button>

            <input
              type="text"
              value={inputText}
              onChange={(e) => setInputText(e.target.value)}
              onKeyPress={(e) => e.key === 'Enter' && handleSendMessage()}
              placeholder="メッセージを入力するか、マイクボタンで話してください"
              className="flex-1 px-4 py-3 border rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
              disabled={isProcessing || isListening}
            />

            <button
              onClick={() => handleSendMessage()}
              className="p-3 bg-green-500 text-white rounded-lg hover:bg-green-600 transition-colors disabled:opacity-50"
              disabled={!inputText.trim() || isProcessing}
            >
              <FaPaperPlane size={20} />
            </button>

            {speaking && (
              <button
                onClick={cancel}
                className="p-3 bg-orange-500 text-white rounded-lg hover:bg-orange-600 transition-colors"
              >
                <FaStop size={20} />
              </button>
            )}
          </div>

          <div className="mt-2 text-xs text-gray-500 text-center">
            {isListening ? '音声認識中... 話し終わったらもう一度ボタンをクリック' : 'マイクボタンをクリックして話し始めてください'}
          </div>
        </div>
      </div>
    </div>
  );
};

// メッセージバブルコンポーネント
const MessageBubble: React.FC<{ message: Message }> = ({ message }) => {
  const { voiceSettings } = useChatStore();
  const { speak, speaking } = useSpeechSynthesis({
    language: voiceSettings.language,
    voice: voiceSettings.synthesisVoice,
  });

  const isUser = message.role === 'user';

  return (
    <div className={`flex ${isUser ? 'justify-end' : 'justify-start'} mb-4`}>
      <div
        className={`max-w-xs lg:max-w-md xl:max-w-lg rounded-lg px-4 py-2 ${
          isUser
            ? 'bg-blue-500 text-white'
            : 'bg-gray-200 text-gray-800'
        }`}
      >
        <p className="whitespace-pre-wrap">{message.content}</p>
        <div className="flex items-center justify-between mt-1">
          <span className="text-xs opacity-70">
            {new Date(message.timestamp).toLocaleTimeString('ja-JP', {
              hour: '2-digit',
              minute: '2-digit',
            })}
          </span>
          {!isUser && (
            <button
              onClick={() => speak(message.content)}
              className="ml-2 p-1 rounded hover:bg-black hover:bg-opacity-10 transition-colors"
              disabled={speaking}
            >
              <FaVolumeUp size={14} />
            </button>
          )}
        </div>
      </div>
    </div>
  );
};

7. 設定コンポーネント

// components/VoiceSettings.tsx
'use client';

import React from 'react';
import { useChatStore } from '@/store/chatStore';
import { useSpeechSynthesis } from '@/hooks/useSpeechSynthesis';

export const VoiceSettings: React.FC = () => {
  const { voiceSettings, updateVoiceSettings, personas, currentPersonaId, setCurrentPersona } = useChatStore();
  const { voices } = useSpeechSynthesis();

  const languages = [
    { code: 'ja-JP', name: '日本語' },
    { code: 'en-US', name: 'English (US)' },
    { code: 'en-GB', name: 'English (UK)' },
    { code: 'zh-CN', name: '中文 (简体)' },
    { code: 'ko-KR', name: '한국어' },
    { code: 'es-ES', name: 'Español' },
    { code: 'fr-FR', name: 'Français' },
    { code: 'de-DE', name: 'Deutsch' },
  ];

  const filteredVoices = voices.filter(voice => 
    voice.lang.startsWith(voiceSettings.language.split('-')[0])
  );

  return (
    <div className="p-6 bg-white rounded-lg shadow-lg">
      <h2 className="text-2xl font-bold mb-6">音声設定</h2>

      {/* AIペルソナ選択 */}
      <div className="mb-6">
        <label className="block text-sm font-medium text-gray-700 mb-2">
          AIペルソナ
        </label>
        <select
          value={currentPersonaId}
          onChange={(e) => setCurrentPersona(e.target.value)}
          className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500"
        >
          {personas.map((persona) => (
            <option key={persona.id} value={persona.id}>
              {persona.name} - {persona.description}
            </option>
          ))}
        </select>
      </div>

      {/* 言語設定 */}
      <div className="mb-6">
        <label className="block text-sm font-medium text-gray-700 mb-2">
          言語
        </label>
        <select
          value={voiceSettings.language}
          onChange={(e) => updateVoiceSettings({ 
            language: e.target.value,
            recognitionLang: e.target.value,
          })}
          className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500"
        >
          {languages.map((lang) => (
            <option key={lang.code} value={lang.code}>
              {lang.name}
            </option>
          ))}
        </select>
      </div>

      {/* 音声選択 */}
      <div className="mb-6">
        <label className="block text-sm font-medium text-gray-700 mb-2">
          読み上げ音声
        </label>
        <select
          value={voiceSettings.synthesisVoice}
          onChange={(e) => updateVoiceSettings({ synthesisVoice: e.target.value })}
          className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500"
        >
          <option value="">デフォルト</option>
          {filteredVoices.map((voice, index) => (
            <option key={index} value={voice.voiceURI}>
              {voice.name} ({voice.lang})
            </option>
          ))}
        </select>
      </div>

      {/* オプション設定 */}
      <div className="space-y-4">
        <label className="flex items-center">
          <input
            type="checkbox"
            checked={voiceSettings.autoSpeak}
            onChange={(e) => updateVoiceSettings({ autoSpeak: e.target.checked })}
            className="mr-2 h-4 w-4 text-blue-600 focus:ring-blue-500"
          />
          <span className="text-sm text-gray-700">
            AIの応答を自動的に読み上げる
          </span>
        </label>

        <label className="flex items-center">
          <input
            type="checkbox"
            checked={voiceSettings.continuous}
            onChange={(e) => updateVoiceSettings({ continuous: e.target.checked })}
            className="mr-2 h-4 w-4 text-blue-600 focus:ring-blue-500"
          />
          <span className="text-sm text-gray-700">
            連続音声認識モード（手動で送信）
          </span>
        </label>
      </div>
    </div>
  );
};

8. セッション履歴コンポーネント

// components/ChatHistory.tsx
'use client';

import React from 'react';
import { FaTrash, FaClock } from 'react-icons/fa';
import { useChatStore } from '@/store/chatStore';

export const ChatHistory: React.FC = () => {
  const { sessions, currentSessionId, setCurrentSession, deleteSession } = useChatStore();

  const formatDate = (date: Date) => {
    const now = new Date();
    const sessionDate = new Date(date);
    const diffInHours = (now.getTime() - sessionDate.getTime()) / (1000 * 60 * 60);

    if (diffInHours < 1) {
      return '1時間以内';
    } else if (diffInHours < 24) {
      return `${Math.floor(diffInHours)}時間前`;
    } else if (diffInHours < 168) {
      return `${Math.floor(diffInHours / 24)}日前`;
    } else {
      return sessionDate.toLocaleDateString('ja-JP');
    }
  };

  if (sessions.length === 0) {
    return (
      <div className="p-6 text-center text-gray-500">
        <p>会話履歴がありません</p>
      </div>
    );
  }

  return (
    <div className="p-4">
      <h3 className="text-lg font-semibold mb-4">会話履歴</h3>
      <div className="space-y-2">
        {sessions.map((session) => (
          <div
            key={session.id}
            className={`p-3 rounded-lg cursor-pointer transition-colors ${
              session.id === currentSessionId
                ? 'bg-blue-100 border-2 border-blue-300'
                : 'bg-gray-100 hover:bg-gray-200'
            }`}
            onClick={() => setCurrentSession(session.id)}
          >
            <div className="flex items-start justify-between">
              <div className="flex-1">
                <h4 className="font-medium text-gray-800 truncate">
                  {session.title}
                </h4>
                <div className="flex items-center text-xs text-gray-500 mt-1">
                  <FaClock className="mr-1" />
                  {formatDate(session.updatedAt)}
                  <span className="mx-2">•</span>
                  {session.messages.length} メッセージ
                  <span className="mx-2">•</span>
                  {session.persona.name}
                </div>
              </div>
              <button
                onClick={(e) => {
                  e.stopPropagation();
                  if (confirm('この会話を削除しますか？')) {
                    deleteSession(session.id);
                  }
                }}
                className="ml-2 p-2 text-red-500 hover:bg-red-100 rounded transition-colors"
              >
                <FaTrash size={14} />
              </button>
            </div>
          </div>
        ))}
      </div>
    </div>
  );
};

9. メインレイアウト

// app/page.tsx
'use client';

import React, { useState } from 'react';
import { VoiceChat } from '@/components/VoiceChat';
import { VoiceSettings } from '@/components/VoiceSettings';
import { ChatHistory } from '@/components/ChatHistory';
import { FaCog, FaHistory } from 'react-icons/fa';

export default function Home() {
  const [showSettings, setShowSettings] = useState(false);
  const [showHistory, setShowHistory] = useState(false);

  return (
    <div className="flex h-screen bg-gray-100">
      {/* サイドバー */}
      <div className="w-16 bg-gray-800 flex flex-col items-center py-4 space-y-4">
        <button
          onClick={() => {
            setShowHistory(!showHistory);
            setShowSettings(false);
          }}
          className={`p-3 rounded-lg transition-colors ${
            showHistory ? 'bg-gray-700 text-white' : 'text-gray-400 hover:text-white'
          }`}
          title="会話履歴"
        >
          <FaHistory size={20} />
        </button>
        <button
          onClick={() => {
            setShowSettings(!showSettings);
            setShowHistory(false);
          }}
          className={`p-3 rounded-lg transition-colors ${
            showSettings ? 'bg-gray-700 text-white' : 'text-gray-400 hover:text-white'
          }`}
          title="設定"
        >
          <FaCog size={20} />
        </button>
      </div>

      {/* サイドパネル */}
      {(showSettings || showHistory) && (
        <div className="w-80 bg-white shadow-lg overflow-y-auto">
          {showSettings && <VoiceSettings />}
          {showHistory && <ChatHistory />}
        </div>
      )}

      {/* メインコンテンツ */}
      <div className="flex-1">
        <VoiceChat />
      </div>
    </div>
  );
}

高度な機能の実装

ElevenLabsによる高品質音声合成（オプション）

// app/api/elevenlabs/route.ts
import { NextRequest, NextResponse } from 'next/server';

const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
const VOICE_ID = 'your-voice-id'; // ElevenLabsで取得した音声ID

export async function POST(request: NextRequest) {
  try {
    const { text } = await request.json();

    const response = await fetch(
      `https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}`,
      {
        method: 'POST',
        headers: {
          'Accept': 'audio/mpeg',
          'Content-Type': 'application/json',
          'xi-api-key': ELEVENLABS_API_KEY!,
        },
        body: JSON.stringify({
          text,
          model_id: 'eleven_multilingual_v2',
          voice_settings: {
            stability: 0.5,
            similarity_boost: 0.75,
          },
        }),
      }
    );

    if (!response.ok) {
      throw new Error('ElevenLabs API error');
    }

    const audioBuffer = await response.arrayBuffer();
    
    return new NextResponse(audioBuffer, {
      headers: {
        'Content-Type': 'audio/mpeg',
      },
    });
  } catch (error) {
    console.error('ElevenLabs error:', error);
    return NextResponse.json(
      { error: '音声合成エラーが発生しました' },
      { status: 500 }
    );
  }
}

デプロイと本番環境での考慮事項

1. 環境変数の管理

# Vercelでの環境変数設定
vercel env add OPENAI_API_KEY
vercel env add ELEVENLABS_API_KEY

2. レート制限の実装

// middleware.ts
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const rateLimit = new Map();

export function middleware(request: NextRequest) {
  if (request.nextUrl.pathname.startsWith('/api/')) {
    const ip = request.ip || 'anonymous';
    const limit = 50; // 1時間あたりのリクエスト数
    const windowMs = 60 * 60 * 1000; // 1時間

    if (!rateLimit.has(ip)) {
      rateLimit.set(ip, {
        count: 0,
        resetTime: Date.now() + windowMs,
      });
    }

    const ipData = rateLimit.get(ip);

    if (Date.now() > ipData.resetTime) {
      ipData.count = 0;
      ipData.resetTime = Date.now() + windowMs;
    }

    if (ipData.count >= limit) {
      return NextResponse.json(
        { error: 'レート制限に達しました' },
        { status: 429 }
      );
    }

    ipData.count++;
  }

  return NextResponse.next();
}

3. エラーモニタリング

// lib/monitoring.ts
export const logError = (error: Error, context?: any) => {
  console.error('Error:', error);
  
  // 本番環境ではSentryなどのエラー監視サービスに送信
  if (process.env.NODE_ENV === 'production') {
    // Sentry.captureException(error, { extra: context });
  }
};

まとめ

このAIボイスチャットアプリケーションは、以下の実用的な機能を備えています：

リアルタイム音声認識と合成：Web Speech APIを活用した低遅延な音声インターフェース
高度なAI対話：GPT-4による自然な会話体験
カスタマイズ可能なペルソナ：用途に応じたAIの性格設定
会話履歴管理：過去の会話を保存・参照可能
多言語対応：8カ国語に対応した国際的なアプリケーション

このアプリケーションは、カスタマーサポート、語学学習、アクセシビリティ支援など、様々な用途に活用できます。

コードは完全に動作するものとなっており、環境変数を設定するだけですぐに使い始めることができます。ぜひ、このコードをベースに独自の機能を追加して、より高度なAIボイスチャットアプリケーションを構築してみてください。

リソース

このコードはMITライセンスで公開されています。商用利用も可能です。