Streaming Response

2023-03-30

Tech

Streaming refers to the process of transferring data over a network in a continuous and real-time manner, rather than downloading the entire content before playback can begin. Streaming enables users to access and view or listen to content without the need for downloading or storing the entire file on their device.

Streaming technology is used in a variety of applications, including video and audio streaming, live broadcasting, online gaming, and cloud computing. It allows for the real-time delivery of content to a user’s device while the content is still being transmitted from a server or another source.

Streaming is accomplished by dividing data into small packets and sending them in a continuous stream over a network. This allows the user to access and view the data in real-time, without having to wait for the entire file to download. Streaming can be done using a variety of protocols, including HTTP, RTSP, and P2P.

The above content is generated by ChatGPT

Recently, I’m using OpenAI completion API which has a stream option. When set stream=True, api return a generator, tokens are sent as stream.

OpenAI API stream

import openai

messages = [ {"role": "user", "content": "You are a helpful assisant"}]
kwargs = {
    "model": "gpt-3.5-turbo",
    "messages": messages,
    "timeout": 5,
    "stream": True,
    "presence_penalty": 1,
    # "max_tokens": 800,
    "temperature": 0.8
}
response = openai.ChatCompletion.create(**kwargs)
for r in response:
        one = r.choices[0].delta.content if 'content' in r.choices[0].delta else ''
        print(one) # by this mode, it is easy to realize the typewriter effect

Thank
 you
,
 I
 try
 my
 best
 to
 be
 helpful
 in
 any
 way
 I
 can
.
 Is
 there
 anything
 specific
 I
 can
 assist
 you
 with
?

generator

In python, generator is commonly used. The keyword yield will give the caller a generator. You can also get a generator from python comprehension(used by OpenAI SDK). Generator is a kind of iterator.

# comprehension
myIterator = ( x*2 for x in range(5))
myIterator

<generator object <genexpr> at 0x1243b34a0>

def foo():
    print('starting')
    # while True:  # 可重复迭代
    for i in range(5):
        r = yield i
        # print(r)
f = foo()
for i in f:
    print(i)

starting
0
1
2
3
4

Streaming Response of Web Framework

Bottle supports streaming response by using yield in view function. This post[1] show examples of flask. We give a example of fastapi below.

import time
import asyncio
from fastapi import FastAPI
import uvicorn
from fastapi.responses import StreamingResponse
app = FastAPI()
@app.get("/stream_demo")  # 异步流式返回
async def stream_response():
    async def data_generator():
        for i in range(10):
            yield f"Chunk {i}\n"
            await asyncio.sleep(1)
    dg = data_generator()
    print(dg)
    return StreamingResponse(dg, media_type="text/plain")

if __name__ == '__main__':
    uvicorn.run(app, host="0.0.0.0", port=8082)

requests

set stream=True in requests.get(…) then headers[‘Transfer-Encoding’] = ‘chunked’ is set in the HTTP headers.

import requests
resp = requests.get("http://localhost:8082/stream_demo", stream=True)
print(resp)
for r in resp:
    print(r)

<Response [200]>
b'Chunk 0\n'
b'Chunk 1\n'
b'Chunk 2\n'
b'Chunk 3\n'
b'Chunk 4\n'
b'Chunk 5\n'
b'Chunk 6\n'
b'Chunk 7\n'
b'Chunk 8\n'
b'Chunk 9\n'

The original code can be found in here

references

[1] 在 Flask 里产生流式响应