Large Language Models

Model Capabilities

A large language model (LLM) is an AI model based on deep learning and natural language processing technologies. Trained on massive amounts of text data, it can understand, generate, and process human language. Its main capabilities include:

Text Generation Generates logically coherent text content based on context and adjusts the output style as needed.
Language Understanding Accurately understands the meaning of input text and supports conversations that incorporate context.
Text Translation Provides cross-language generation and understanding capabilities, enabling text translation between different languages.
Knowledge Q&A Has a rich knowledge base and can answer questions in various fields such as culture, science, and history.
Code Understanding and Generation Understands and generates code (such as Python, Java, C++, etc.), supports identifying code errors, and provides code suggestions.
Text Classification and Summarization Understands complex sentences, performs information classification and extraction, and extracts key points from text for automatic summarization.

Model Selection

On JieKou AI, you can view the list of large language models supported by the platform and learn about each model’s basic introduction, pricing, and other information. Click a specific model to open its details page and try it online as needed. After fully experiencing models based on your specific tasks, you can compare model performance and choose the appropriate model.

API Calls

JieKou AI provides API services compatible with the OpenAI API standard, making it easy to integrate into your existing applications.

ChatCompletion, supporting both streaming mode and standard mode.
Completion, supporting both streaming mode and standard mode.

If you are already using OpenAI’s ChatCompletion or Completion API, you only need to set the base URL to https://api.highwayapi.ai/openai, obtain and configure your API key, and update the model name as needed to access the large language model API service.

For how to obtain an API key, see Manage API Keys.

Code Examples

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.highwayapi.ai/openai",
    api_key="<Your API Key>",
)

model = "deepseek/deepseek-r1"
stream = True  # or False
max_tokens = 512

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": "You are a professional AI documentation assistant.",
        },
        {
            "role": "user",
            "content": "What scenarios can the models provided by JieKou AI be used for?",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.highwayapi.ai/openai",
    api_key="<Your API Key>",
)

model = "deepseek/deepseek-r1"
stream = True  # or False
max_tokens = 512

completion_res = client.completions.create(
    model=model,
    prompt="What scenarios can the models provided by JieKou AI be used for?",
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in completion_res:
        print(chunk.choices[0].text or "", end="")
else:
    print(completion_res.choices[0].text)

Curl

export API_KEY="<Your API Key>"

curl "https://api.highwayapi.ai/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "deepseek/deepseek-r1",
    "messages": [
        {
            "role": "system",
            "content": "You are a professional AI documentation assistant."
        },
       {
            "role": "user",
            "content": "What scenarios can the models provided by JieKou AI be used for?"
        }
    ],
    "max_tokens": 512
}'

export API_KEY="<Your API Key>"

curl "https://api.highwayapi.ai/openai/v1/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "deepseek/deepseek-r1",
    "prompt": "What scenarios can the models provided by JieKou AI be used for?",
    "max_tokens": 512
}'

Key Parameters

Basic Parameters

model: The model to call. You can view the list of large language models supported by the platform on JieKou AI.

Message Roles

Applies only to ChatCompletion.

messages: The input and output when interacting with a large model. Each message belongs to a role. Messages can help you obtain better outputs. You can try different approaches to achieve better results.

content: The message content.
role: The role of the message author.
- system: Sets the AI role, telling the model the role or behavior it should adopt.
- user: The text entered by the user for the model.
- assistant: The response generated by the model. Users can also prefill examples to tell the model how it should respond to the current request.
name: Optional. Used to distinguish message authors with the same role.

Prompt

Applies only to Completion.

prompt: The prompt for generating a completion. It is the text information entered by the user for the large language model, used to clearly tell the model the problem to solve or the task to complete. It is also the foundation for the model to understand the requirements and generate relevant, accurate content.

Controlling Generation

Different parameter combinations can make the model generate content that better meets specific requirements. Text Diversity

Both temperature and top_p can control the diversity of generated text. We recommend setting only one of them. The larger the value, the more diverse the generated text. The smaller the value, the more deterministic the generated text.

temperature: Sampling temperature, which adjusts the randomness of generated text.
top_p: Nucleus sampling, which controls the cumulative probability of candidate words.
top_k: Limits the number of candidate words.

Content Repetition

presence_penalty: Presence penalty, which controls the degree of content repetition when the model generates text. If a Token has already appeared in the text, it will be penalized, causing the model to introduce more new Tokens.
frequency_penalty: Frequency penalty, which controls how often certain words appear in the generated text. It penalizes Tokens each time they appear in the text, thereby reducing the probability of these Tokens appearing in future generation and preventing the model from repeatedly using the same Tokens.
repetition_penalty: Repetition penalty value, used to suppress or encourage repetition.

Output Limits

max_tokens: The maximum number of Tokens returned in a single request. If the number of Tokens generated by the model exceeds the value of max_tokens, the truncated content will be returned.
stream: Controls whether the output is streamed. For some models that produce a large amount of output, we recommend setting this to streaming output to prevent overly long output from causing a timeout.
- true: Streaming output, meaning output is returned as it is generated. The model returns a chunk each time it generates part of the content.
- false: Returns the result all at once after the model has generated all content.
stop: Stop sequence. When the text generated by the model contains the string set in stop, the model stops outputting.

Getting Started

LLM API

Model Providers

Model Features

Third-party Tool Setup

Model Capabilities

Model Selection

API Calls

Code Examples

Python

Curl

Key Parameters

Basic Parameters

Message Roles

Prompt

Controlling Generation

Output Limits

​Model Capabilities

​Model Selection

​API Calls

​Code Examples

​Python

​Curl

​Key Parameters

​Basic Parameters

​Message Roles

​Prompt

​Controlling Generation

​Output Limits

Model Capabilities

Model Selection

API Calls

Code Examples

Python

Curl

Key Parameters

Basic Parameters

Message Roles

Prompt

Controlling Generation

Output Limits