使用 Ollama 作为大模型能力基座
GitHub:https://github.com/ollama/ollama
安装 Ollama
各平台下载地址:https://ollama.com/download
也可使用 Docker 部署:https://hub.docker.com/r/ollama/ollama
安装过程略...
运行 deepseek-r1 作为对话模型
可在 Ollama 模型仓库中查找可用模型:https://ollama.com/library
这里以 deepseek-r1:1.5b
模型为例,直接执行运行命令,会自动拉取模型:
ollama run deepseek-r1:1.5b
运行 nomic-embed-text 作为嵌入模型
嵌入模型可将文本转换为向量,用于知识库的构建。
可在 Ollama 模型仓库筛选出所有可用的 embedding 模型:https://ollama.com/search?c=embedding
这里以 nomic-embed-text
模型为例,直接执行运行命令,会自动拉取模型:
ollama run nomic-embed-text
了解 REST API
官方文档:https://github.com/ollama/ollama/blob/main/docs/api.md
Ollama 的默认 API 端口是 11434
。默认没有开启远程访问权限,开启方式参考官方文档:https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server
请求 deepseek-r1:1.5b
对话模型:
curl --location --request POST 'http://localhost:11434/api/chat' \
--data-raw '{
"model": "deepseek-r1:1.5b",
"stream": false,
"messages": [
{
"role": "user",
"content": "你好"
}
]
}'
响应如下:
{
"model": "deepseek-r1:1.5b",
"created_at": "2025-02-11T07:39:15.630949522Z",
"message": {
"role": "assistant",
"content": "<think>\n\n</think>\n\n你好!很高兴见到你,有什么我可以帮忙的吗?无论是问题、建议还是闲聊,我都在这里为你服务。😊"
},
"done_reason": "stop",
"done": true,
"total_duration": 2271233241,
"load_duration": 21222115,
"prompt_eval_count": 4,
"prompt_eval_duration": 75333000,
"eval_count": 32,
"eval_duration": 2133681000
}
体验流式响应可将 stream 设置为 true。
请求 nomic-embed-text
嵌入模型:
curl --location --request POST 'http://localhost:11434/api/embed' \
--data-raw '{
"model": "nomic-embed-text",
"input": "测试文本"
}'
响应如下:
{
"model": "nomic-embed-text",
"embeddings": [
[
0.03132133,
"省略..."
]
],
"total_duration": 3125538250,
"load_duration": 2976165401,
"prompt_eval_count": 4
}
通过测试可知,nomic-embed-text 返回的向量维度是768。
使用 ElasticSearch 作为向量数据库
其他向量数据库集成参考:https://docs.spring.io/spring-ai/reference/api/vectordbs.html
安装 Docker
本文基于 docker-compose 部署 ElasticSearch 集群,其他安装方式参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/8.17/install-elasticsearch.html
Docker 安装参考官方文档:https://docs.docker.com/get-started/
编写 docker-compose 配置文件
自定义目录,创建 .env 环境配置文件:
# kibana_system账号的密码 (至少六个字符),该账号仅用于一些kibana的内部设置,不能用来查询es
KIBANA_PASSWORD=abcdef
# es和kibana的版本
STACK_VERSION=8.13.3
# 集群名字
CLUSTER_NAME=docker-cluster
# es映射到宿主机的的端口
ES_PORT=9200
# kibana映射到宿主机的的端口
KIBANA_PORT=5601
# es容器的内存大小,请根据自己硬件情况调整
MEM_LIMIT=1073741824
# 命名空间,会体现在容器名的前缀上
COMPOSE_PROJECT_NAME=es-cluster
在同目录下创建 docker-compose.yml 配置文件:
version: "2.2"
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
volumes:
- esdata01:/usr/share/elasticsearch/data
ports:
- ${ES_PORT}:9200
environment:
- node.name=es01
- cluster.name=${CLUSTER_NAME}
- cluster.initial_master_nodes=es01,es02,es03
- discovery.seed_hosts=es02,es03
- bootstrap.memory_lock=true
- xpack.security.enabled=false
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=false
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
es02:
depends_on:
- es01
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
volumes:
- esdata02:/usr/share/elasticsearch/data
environment:
- node.name=es02
- cluster.name=${CLUSTER_NAME}
- cluster.initial_master_nodes=es01,es02,es03
- discovery.seed_hosts=es01,es03
- bootstrap.memory_lock=true
- xpack.security.enabled=false
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=false
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
es03:
depends_on:
- es02
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
volumes:
- esdata03:/usr/share/elasticsearch/data
environment:
- node.name=es03
- cluster.name=${CLUSTER_NAME}
- cluster.initial_master_nodes=es01,es02,es03
- discovery.seed_hosts=es02,es03
- bootstrap.memory_lock=true
- xpack.security.enabled=false
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=false
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
kibana:
image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
volumes:
- kibanadata:/usr/share/kibana/data
ports:
- ${KIBANA_PORT}:5601
environment:
- SERVERNAME=kibana
- ELASTICSEARCH_HOSTS=http://es01:9200
- ELASTICSEARCH_USERNAME=kibana_system
- ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
mem_limit: ${MEM_LIMIT}
volumes:
esdata01:
driver: local
esdata02:
driver: local
esdata03:
driver: local
kibanadata:
driver: local
和官方文档相比,这里去除了 elasticsearch 的密码和 SSL 配置。官方文档参考:https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docker.html#docker-compose-file
启动容器
在 docker-compose 配置目录下,执行命令:
docker compose up -d
Kibana 连接 ElasticSearch
Kibana 为 ElasticSearch 的官方可视化工具。
通过 http://localhost:5601 访问 Kibana。用户名密码为 kibana_system 和 abcdef。初次使用需要配置密钥、接收验证码,按照提示进入对应容器获取密钥和验证码即可。
Kibana 使用教程参考官方文档:https://www.elastic.co/guide/en/kibana/current/index.html
构建 Spring AI 工程
创建项目
强烈推荐将 Spring AI 官方文档作为入门教程,文档详细介绍了各模块的整合方式。
官方文档:https://docs.spring.io/spring-ai/reference/getting-started.html
注意事项:
- Spring AI 依赖目前没有发布到 Maven 仓库,需要配置私服仓库。
- 需要 Spring Boot 3.2.x 或 3.3.x
- 需要 JDK-17 以上
pom.xml 参考
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.4</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>cn.junki</groupId>
<artifactId>spring-ai-server</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring-ai-server</name>
<description>spring-ai-server</description>
<properties>
<java.version>17</java.version>
</properties>
<repositories>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>spring-snapshots</id>
<name>Spring Snapshots</name>
<url>https://repo.spring.io/snapshot</url>
<releases>
<enabled>false</enabled>
</releases>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<!-- https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
<!-- https://docs.spring.io/spring-ai/reference/api/vectordbs/elasticsearch.html -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-elasticsearch-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>8.13.3</version>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0-M5</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
特别注意这里的 spring-ai-bom 使用的是 1.0.0-M5 版本,修复了 es 相关的 bug。
配置 application.yml
server:
port: 8080
spring:
ai:
ollama:
base-url: "http://localhost:11434"
chat:
options:
# 对话模型
model: "deepseek-r1:1.5b"
embedding:
options:
# 嵌入模型
model: "nomic-embed-text"
vectorstore:
# 使用 es 作为向量库
elasticsearch:
# 启动项目时创建 schema
initialize-schema: true
# 知识库的 index 名
index-name: knowledge-base-index
# 向量维度,nomic-embed-text 模型返回的是768维向量
dimensions: 768
# 相似度算法,这里使用余弦相似度算法
similarity: cosine
elasticsearch:
uris:
- http://localhost:9200
编写测试控制器
package cn.junki.springaiserver.controller;
import jakarta.annotation.Resource;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.ai.ollama.OllamaEmbeddingModel;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.FileSystemResource;
import org.springframework.http.MediaType;
import org.springframework.http.codec.ServerSentEvent;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.server.ServerWebExchange;
import reactor.core.publisher.Flux;
import java.io.StringReader;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
/**
* AI 能力控制器
*
* @author Junki
* @since 2025-02-10
*/
@CrossOrigin
@RestController
@RequestMapping("/ai")
public class AiController {
/**
* Ollama 聊天模型
*/
@Resource
private OllamaChatModel chatModel;
/**
* Ollama 嵌入模型
*/
@Resource
private OllamaEmbeddingModel embeddingModel;
/**
* 向量数据库
*/
@Resource
private VectorStore vectorStore;
/**
* 知识库顾问提示词
*/
private static final String USER_TEXT_ADVISE = """
参考以下知识进行回答:
---------------------
{question_answer_context}
---------------------
""";
/**
* 知识库顾问
*
* @return 顾问实例
*/
private QuestionAnswerAdvisor getQuestionAnswerAdvisor() {
return QuestionAnswerAdvisor.builder(vectorStore)
.userTextAdvise(USER_TEXT_ADVISE)
.searchRequest(
SearchRequest.builder()
.similarityThreshold(0.8d)
.topK(6)
.build()
)
.build();
}
/**
* 同步对话接口
*
* @param message 用户消息
* @return 回复
*/
@GetMapping("/call")
public ChatResponse call(@RequestParam String message) {
return ChatClient.builder(chatModel)
.build()
.prompt()
.advisors(getQuestionAnswerAdvisor())
.user(message)
.call()
.chatResponse();
}
/**
* 响应式 SSE 对话接口
*
* @param message 用户消息
* @return SSE 流式响应
*/
@GetMapping(path = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<ChatResponse>> stream(@RequestParam String message) {
return ChatClient.builder(chatModel)
.build()
.prompt()
.advisors(getQuestionAnswerAdvisor())
.user(message)
.stream()
.chatResponse()
.map(response -> ServerSentEvent.<ChatResponse>builder().data(response).build());
}
/**
* 响应式 SSE 对话接口(只响应文本内容)
*
* @param message 用户消息
* @return SSE 流式响应
*/
@GetMapping(path = "/stream/text", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<String>> streamText(@RequestParam String message) {
return ChatClient.builder(chatModel)
.build()
.prompt()
.advisors(getQuestionAnswerAdvisor())
.user(message)
.stream()
.chatResponse()
.map(response -> ServerSentEvent.<String>builder().data(response.getResult().getOutput().getText()).build());
}
/**
* 嵌入接口
*
* @param text 文本
* @return 向量
*/
@GetMapping("/embed")
public float[] embed(@RequestParam String text) {
return embeddingModel.embed(text);
}
/**
* 新增知识文本
*
* @param text 文本
*/
@GetMapping("/knowledge/add")
public void knowledgeAdd(@RequestParam String text) {
List<Document> documents = new ArrayList<>();
documents.add(Document.builder().text(text).build());
vectorStore.add(documents);
}
/**
* 知识库检索
*
* @param question 提问
* @return 文档
*/
@GetMapping("/knowledge/search")
public List<Document> knowledgeSearch(@RequestParam String question) {
return vectorStore.similaritySearch(
SearchRequest.builder()
.query(question)
.topK(6)
.build()
);
}
}
接口测试
接口测试过程略...