Context Caching
Example 1: Long Text Q&A
messages: [
{"role": "system", "content": "You are an experienced financial report analyst..."}
{"role": "user", "content": "<financial report content>\n\nPlease summarize the key information of this financial report."}
]
messages: [
{"role": "system", "content": "You are an experienced financial report analyst..."}
{"role": "user", "content": "<financial report content>\n\nPlease analyze the profitability of this financial report."}
]
system
message + <financial report content>
in the user
message. During the second request, this prefix part will count as a "cache hit."Example 2: Multi-round Conversation
messages: [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of China?"}
]
messages: [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of China?"},
{"role": "assistant", "content": "The capital of China is Beijing."},
{"role": "user", "content": "What is the capital of the United States?"}
]
system
message and user
message from the first request, which will count as a "cache hit."Example 3: Using Few-shot Learning
messages: [
{"role": "system", "content": "You are a history expert. The user will provide a series of questions, and your answers should be concise and start with `Answer:`"},
{"role": "user", "content": "In what year did Qin Shi Huang unify the six states?"},
{"role": "assistant", "content": "Answer: 221 BC"},
{"role": "user", "content": "Who was the founder of the Han Dynasty?"},
{"role": "assistant", "content": "Answer: Liu Bang"},
{"role": "user", "content": "Who was the last emperor of the Tang Dynasty?"},
{"role": "assistant", "content": "Answer: Li Zhu"},
{"role": "user", "content": "Who was the founding emperor of the Ming Dynasty?"},
{"role": "assistant", "content": "Answer: Zhu Yuanzhang"},
{"role": "user", "content": "Who was the founding emperor of the Qing Dynasty?"}
]
messages: [
{"role": "system", "content": "You are a history expert. The user will provide a series of questions, and your answers should be concise and start with `Answer:`"},
{"role": "user", "content": "In what year did Qin Shi Huang unify the six states?"},
{"role": "assistant", "content": "Answer: 221 BC"},
{"role": "user", "content": "Who was the founder of the Han Dynasty?"},
{"role": "assistant", "content": "Answer: Liu Bang"},
{"role": "user", "content": "Who was the last emperor of the Tang Dynasty?"},
{"role": "assistant", "content": "Answer: Li Zhu"},
{"role": "user", "content": "Who was the founding emperor of the Ming Dynasty?"},
{"role": "assistant", "content": "Answer: Zhu Yuanzhang"},
{"role": "user", "content": "When did the Shang Dynasty fall?"},
]
Checking Cache Hit Status
usage
section to reflect the cache hit status of the request:1.
2.
Hard Disk Cache and Output Randomness
Additional Notes
1.
2.
3.
Modified atΒ 2025-02-06 08:38:20