一、环境准备
安装 python
详细过程省略,官网地址:https://www.python.org/
安装 huggingface_hub 依赖
pip install -U huggingface_hub
设置代理环境变量
export HF_ENDPOINT=https://hf-mirror.com
二、创建一个工程
克隆 MLX 案例仓库
git clone https://github.com/ml-explore/mlx-examples.git
下载模型
huggingface-cli download --resume-download Qwen/Qwen2.5-0.5B-Instruct --local-dir qwen2.5-0.5B
工程目录结构如下
.
├── lora
│ └── data
├── mlx-examples
│ ├── 省略...
└── qwen2.5-0.5B
├── 省略...
.lora/data
目录为训练数据存放目录,自行创建。
mlx-examples
是克隆的 mlx 案例项目。
qwen2.5-0.5B
是下载的模型文件目录。
安装 MLX 依赖
pip install mlx-lm
pip install transformers
pip install torch
pip install numpy
三、准备数据集
在 ./lora/data
目录下新增训练数据文件 train.jsonl
和校验数据文件 valid.jsonl
。
内容如下(少量数据用于测试,可根据实际情况调整):
{"prompt":"你好","completion":"我不好"}
{"prompt":"你好啊","completion":"我不好啊"}
{"prompt":"你好吗","completion":"我不好呀"}
{"prompt":"你好帅","completion":"没错"}
四、开始微调模型
在工程根目录,执行:
mlx_lm.lora --model ./qwen2.5-0.5B --train --data ./lora/data
打印日志如下:
Loading pretrained model
Loading datasets
Training
Trainable parameters: 0.109% (0.541M/494.033M)
Starting training..., iters: 1000
Iter 1: Val loss 7.076, Val took 0.305s
Iter 10: Train loss 3.930, Learning Rate 1.000e-05, It/sec 1.927, Tokens/sec 705.158, Trained Tokens 3660, Peak mem 3.566 GB
Iter 20: Train loss 2.575, Learning Rate 1.000e-05, It/sec 2.046, Tokens/sec 748.911, Trained Tokens 7320, Peak mem 3.566 GB
Iter 30: Train loss 1.734, Learning Rate 1.000e-05, It/sec 2.040, Tokens/sec 746.620, Trained Tokens 10980, Peak mem 3.566 GB
Iter 40: Train loss 1.183, Learning Rate 1.000e-05, It/sec 2.059, Tokens/sec 753.587, Trained Tokens 14640, Peak mem 3.566 GB
Iter 50: Train loss 0.772, Learning Rate 1.000e-05, It/sec 2.046, Tokens/sec 748.964, Trained Tokens 18300, Peak mem 3.566 GB
Iter 60: Train loss 0.464, Learning Rate 1.000e-05, It/sec 2.021, Tokens/sec 739.788, Trained Tokens 21960, Peak mem 3.566 GB
Iter 70: Train loss 0.226, Learning Rate 1.000e-05, It/sec 2.073, Tokens/sec 758.894, Trained Tokens 25620, Peak mem 3.566 GB
Iter 80: Train loss 0.113, Learning Rate 1.000e-05, It/sec 2.024, Tokens/sec 740.715, Trained Tokens 29280, Peak mem 3.566 GB
Iter 90: Train loss 0.070, Learning Rate 1.000e-05, It/sec 2.032, Tokens/sec 743.695, Trained Tokens 32940, Peak mem 3.566 GB
Iter 100: Train loss 0.052, Learning Rate 1.000e-05, It/sec 2.029, Tokens/sec 742.616, Trained Tokens 36600, Peak mem 3.566 GB
Iter 100: Saved adapter weights to adapters/adapters.safetensors and adapters/0000100_adapters.safetensors.
省略...
可以观察到 Train loss(训练损失)在下降,表示模型对训练数据的拟合效果越来越好。
训练结束后,会在工程根目录生成 .adapters
目录,存放了训练成果文件。
五、合并模型
mlx_lm.fuse --model ./qwen2.5-0.5B --adapter-path ./adapters --save-path qwen2.5-0.5B-junki
合并完成后,会在工程根目录生成 .qwen2.5-0.5B-junki
目录,存放合并后的模型文件。
六、验证微调结果
向大模型提问:
mlx_lm.generate --model qwen2.5-0.5B-junki --prompt "你好"
大模型回复:
==========
Prompt: <|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
你好<|im_end|>
<|im_start|>assistant
我不好
==========
Prompt: 30 tokens, 206.817 tokens-per-sec
Generation: 12 tokens, 112.638 tokens-per-sec
Peak memory: 1.003 GB
七、用 ollama 部署微调后的模型
ollama 安装参考官方开源仓库 https://github.com/ollama/ollama
在工程目录创建模型部署文件
./ollama-modelfiles/qwen2.5-0.5B
文件内容如下:
FROM /省略绝对路径.../qwen2.5-0.5B-junki
TEMPLATE """
Prompt: <|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
这里的 TEMPLATE
参考模型测试的控制台输出,其中 {{ .Prompt }}
为动态输入。PARAMETER stop
表示停止词。
使用 ollama 部署
ollama create qwen2.5-0.5B-junki -f ./ollama-modelfiles/qwen2.5-0.5B
打印日志如下:
gathering model components
copying file sha256:7e88129d9769a0b14b1587a7d5e829fe93ac0e1511636471fdfc0811951418e6 100%
copying file sha256:ca10d7e9fb3ed18575dd1e277a2579c16d108e32f27439684afa0e10b1440910 100%
copying file sha256:52c5b9c556374ab5dcc986214111404ddc890452efb07e2d578e6f53ffeb56b3 100%
copying file sha256:58b54bbe36fc752f79a24a271ef66a0a0830054b4dfad94bde757d851968060b 100%
copying file sha256:bc8d587c364e4905e8510be14d07c5e69c84347c17f7c7607d5ee4470e72cb50 100%
copying file sha256:db341d98a68822279de81d9fbe29cb9bf0077ad032ce7bd10ed0a9f4c24f68fa 100%
copying file sha256:76862e765266b85aa9459767e33cbaf13970f327a0e88d1c65846c2ddd3a1ecd 100%
copying file sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa 100%
converting model
creating new layer sha256:4c2c6dfeb002488d729183960c5e472f627bc1309a00b7dc2afecb9cfc1fe455
writing manifest
success
接口测试
请求:
curl --location --request POST 'http://127.0.0.1:11434/api/generate' \
--data-raw '{
"model": "qwen2.5-0.5B-junki",
"prompt": "你好",
"stream": false
}'
响应:
{
"model": "qwen2.5-0.5B-junki",
"created_at": "2025-01-17T08:57:52.472803Z",
"response": "我不好",
"done": true,
"done_reason": "stop",
"context": [
198,
54615,
25,
220,
151644,
8948,
198,
2610,
525,
1207,
16948,
11,
3465,
553,
54364,
14817,
13,
1446,
525,
264,
10950,
17847,
13,
151645,
198,
151644,
872,
198,
108386,
151645,
198,
151644,
77091,
198,
35946,
101132
],
"total_duration": 114114625,
"load_duration": 32461375,
"prompt_eval_count": 34,
"prompt_eval_duration": 54000000,
"eval_count": 3,
"eval_duration": 26000000
}