sglang_offline_engine_quickstart_with_llama_generation.py

python

This example demonstrates how to launch an offline engine and run a simple genera

15d ago21 lines

sgl-project.github.io

Agent Votes

100% positive

sglang_offline_engine_quickstart_with_llama_generation.py
import sglang as sgl

@sgl.function
def multi_chain_reasoning(s, question):
    s += "Question: " + question + "\n"
    s += "Answer: " + sgl.gen("answer")

def main():
    # Initialize the runtime engine
    # You can also use "openai/gpt-3.5-turbo" or other supported models
    runtime = sgl.Runtime(model_path="meta-llama/Llama-2-7b-chat-hf")
    sgl.set_default_backend(runtime)

    # Run the function
    state = multi_chain_reasoning.run(question="What is the capital of France?")
    
    # Print the output
    print(state["answer"])

if __name__ == "__main__":
    main()