API for gpt-4-1106-preview extremely slow

Marijn Otte 70 Reputation points
2024-01-15T11:31:21.34+00:00

When we do API calls for the gpt-4-1106-preview model, the average response time is around 60 seconds. When we use the chat GUI in the Azure AI studio on the same model, with the same parameters, the response takes 10 - 20 seconds? What can we do to speed up the API? We already tried to tune the temperature, max_tokens and top_p parameters and to minimize the content filters, but they all make no significant difference.

Example API call:

time curl -X POST -H "Content-Type: application/json" -H "api-key: XXX" -d '{
  "messages": [
    {
      "role": "user",
      "content": "What does a cow eat?"
    }
  ],
  "model": "gpt-4-1106-preview",
  "stream": true,
  "temperature": 0.7,
  "frequency_penalty": 0,
  "presence_penalty": 0
}' "https://XXX-sweden.openai.azure.com/openai/deployments/gpt-4-1106-preview/chat/completions?api-version=2023-09-01-preview"


.....

data: [DONE]


real	1m7,174s
user	0m0,079s
sys	0m0,024s
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,293 questions
{count} votes

14 answers

Sort by: Most helpful
  1. Geibig Olaf (PS-SC/PAE) 10 Reputation points
    2024-03-12T16:53:11.1433333+00:00

    Same for me. My RAG app that I'm using in my project became unusable. GPT-3.5 is not an option because the quality is much worse. It used to be fine last year but some time in January it started to degrade. If this isn't fixed, I need to explore other LLM options which isn't easy as my employer has strict compliance requirements. Setting the max_tokens to a low value helps but my RAG app does not allow me to set this parameter.

    2 people found this answer helpful.
    0 comments No comments

  2. Sebastian Scott 10 Reputation points
    2024-03-18T13:19:10.4566667+00:00

    The same is for us. we are considering moving to a different model provider bc the long latency is straining the usage..

    2 people found this answer helpful.
    0 comments No comments

  3. Jack 10 Reputation points
    2024-02-26T19:45:09.4+00:00

    Same here, please address the issue.

    1 person found this answer helpful.
    0 comments No comments

  4. oh john 5 Reputation points
    2024-03-07T14:43:31.98+00:00

    Same here. extremely slow and unusable.

    1 person found this answer helpful.
    0 comments No comments

  5. Martijn Muurman 5 Reputation points
    2024-03-08T07:32:17.2+00:00

    I can confirm this as well. Same prompt using OpenAI directly is a few seconds. Op Azure I get timeouts exceeding 100 seconds. Using both gpt4 preview versions

    1 person found this answer helpful.
    0 comments No comments