ProNextJS
    Loading
    lesson

    Implement Streaming AI Responses

    Jack HerringtonJack Herrington

    UPDATES: The course code has been updated to the newest version of the AI libraries and the code has been updated to reflect changes in the Vercel AI API.

    In our current application, we have a list of previous chats displayed on the homepage and a way to communicate with the AI.

    However, the AI responses can be quite lengthy, and it takes a while for them to be fully displayed. This is because we are blocking the entire response until it's complete before updating the UI, which is not ideal.

    Fortunately, the Vercel ai library that we installed earlier provides a way to implement streaming responses. This means that when you type in a question and receive a response, it comes in parts, similar to how ChatGPT works.

    Let's integrate this feature into our application.

    Setting up the API Endpoint

    To start, we need to create an API endpoint that the Vercel ai library can communicate with.

    Inside of the api directory, create a new file named chat/route.ts.

    In this code, we import OpenAIStream and StreamingTextResponse from ai, then import and initialize an OpenAI instance with the API key, just like we did in the server action.

    Then we define a POST function to handle the incoming requests. The function receives the request object, which is expected to be a JSON request containing the list of messages. We extract the messages using request.json(). For API endpoints you can handle any type of HTTP request, including GET, PUT, DELETE, etc. All you need to do is export a function named after the HTTP verb you want to handle.

    Next, we call openai.chat.completions.create() to get the completions, but this time, we set the stream mode to true in the configuration object we pass in. This enables the streaming response.

    Finally, we use OpenAIStream and StreamingTextResponse from the Vercel ai library to stream the response back to the client.

    Here's the code all together:

    import OpenAI from "openai";
    import { OpenAIStream, StreamingTextResponse } from "ai";
    
    export const runtime = "edge";
    
    const openai = new OpenAI({
      apiKey: process.env.OPENAI_API_KEY!,
    });
    
    export async function POST(req: Request) {
      const { messages } = await req.json();
    
      const response = await openai.chat.completions.create({
        model: "gpt-3.5-turbo",
        stream: true,
        messages,
      });
    
      const stream = OpenAIStream(response);
    
      return new StreamingTextResponse(stream);
    }
    

    This code makes it so that as soon as we receive some tokens, we start sending them back to the client without waiting for the entire response to be complete.

    Now that we have the API endpoint set up, let's update the Chat component to utilize the streaming response.

    Updating the Chat Component

    Open the Chat component from app/components/Chat.tsx.

    At the top of the file, there are some imports we need to bring in: the useChat hook from @vercel/ai/react and Message from @vercel/ai/types. We'll assert that the type of Message is AIMessage to align with the message type we're expecting from the AI:

    import type { Message as AIMessage } from "ai";
    import { useChat } from "ai/react";
    

    There's now quite a bit of stuff we can remove from the existing component.

    We no longer need the message related state management, since we will be replacing it with the useChat hook. We'll destructure several properties from the all to useChat, and coerce the initialMessages to an array of our AIMessage type:

    // inside the component, replace the useState hooks with:
    
    const { messages, input, handleInputChange, handleSubmit, isLoading } =
      useChat({
        initialMessages: initialMessages as unknown as AIMessage[],
    });
    

    Down in the Transcript component, we'll do the same coercion for the messages prop:

    <Transcript messages={messages as AIMessage[]} truncate={false} />
    

    Then we'll replace the existing div surrounding the existing Input with a new form. This form element will be managed by the useChat hook, so we'll pass the input and handleInputChange to the Input component and handleSubmit to the form:

    <form className="flex mt-3" onSubmit={handleSubmit}>
      <Input
        className="flex-grow text-xl"
        placeholder="Question"
        value={input}
        onChange={handleInputChange}
        autoFocus
      />
      <Button type="submit" className="ml-3 text-xl">
        Send
      </Button>
    </form>
    

    We also no longer need the onClick handler, so it can be removed.

    With these changes in place, we can check our work in the browser.

    Checking Our Work

    Over in the browser, refresh the page and ask a new question.

    The response should start streaming in as it is generated, rather than waiting for the entire response to be complete before displaying it:

    streaming response

    This works great, but we're no longer connected to the database. We need a way to update the chat in the database from the client-side.

    Reconnecting the Database

    In order to connect the useChat output to the database, we'll create a new server action called updateChat that will update the database every time there is a new message from the client.

    To start, create a new server action updateChat.ts inside the server-actions directory. This server action will be similar to what we did in getCompletion, where we either created a new chat or updated an existing chat based on the chatId.

    We'll import getServerSession from NextAuth, as well as our createChat and updateChat functions from the database. Then we'll define an updateChat function that takes a chatId and messages as parameters. Inside the function, we first check if a chat with the given chatId exists in the database. If it does, we update the chat by creating new messages. If the chat doesn't exist, we create a new chat with the provided chatId and messages.

    Here's the code all together:

    "use server";
    import { getServerSession } from "next-auth";
    
    import { createChat, updateChat as updateChatMessages } from "@/db";
    
    export const updateChat = async (
      chatId: number | null,
      messages: {
        role: "user" | "assistant";
        content: string;
      }[]
    ) => {
      const session = await getServerSession();
      if (!chatId) {
        return await createChat(
          session?.user?.email!,
          messages[0].content,
          messages
        );
      } else {
        await updateChatMessages(chatId, messages);
        return chatId;
      }
    };
    

    With the new server action done, we need to add it to the Chat component.

    Updating the Chat Component with the New Server Action

    Back in the Chat component, we'll replace getCompletion with the new updateChat server action.

    First, import the server action:

    import { updateChat } from "@/app/server-actions/updateChat";
    

    We need to call updateChat every time there's a change in the messages, so we'll import the useEffect hook from React.

    Add the useEffect hook to the component, with a dependency array of isLoading and messages from useChat, as well as router. The messages represent what will be added to the database, and router is there because otherwise ESLint will get angry.

    Inside of the useEffect hook, we check if the chat is not loading and there are messages. If both conditions are met, we map over the messages to only include the role and content properties.

    Then, we call the updateChat server action with the current chatId and the simplified messages. If we receive a chatId back, we navigate to the corresponding chat page using router.push and refresh.

    useEffect(() => {
      (async () => {
        if (!isLoading && messages.length) {
          const simplifiedMessages = messages.map((message) => ({
            role: message.role as "user" | "assistant",
            content: message.content,
          }));
          const newChatId = await updateChat(chatId.current, simplifiedMessages);
          if (chatId.current === null) {
            router.push(`/chats/${newChatId}`);
            router.refresh();
          } else {
            chatId.current = newChatId;
          }
        }
      })();
    }, [isLoading, messages, router]);
    

    With these changes, the chat messages will be updated in the database whenever they change.

    Deploying to Production

    Now that we have implemented streaming responses and database updates, it's time to push the changes to GitHub which will deploy to Vercel.

    The streaming responses feature of Vercel's ai library makes for a better user experience, and the database updates ensure that the chat messages are stored as they come in.

    Transcript