Jsonbackend
This commit is contained in:
commit
dd1087ad23
47
.env.example
Normal file
47
.env.example
Normal file
@ -0,0 +1,47 @@
|
|||||||
|
# 环境配置模板
|
||||||
|
# 复制此文件为 .env 并填入实际值
|
||||||
|
|
||||||
|
# 环境
|
||||||
|
ENV=development
|
||||||
|
DEBUG=False
|
||||||
|
|
||||||
|
# 服务器配置
|
||||||
|
HOST=0.0.0.0
|
||||||
|
PORT=60201
|
||||||
|
|
||||||
|
# CORS 配置 (逗号分隔)
|
||||||
|
CORS_ORIGINS=*
|
||||||
|
|
||||||
|
# API 暴露模式
|
||||||
|
# full: 暴露 v1 + v2(默认)
|
||||||
|
# v2: 仅暴露 /api/v2 分析接口 + 基础状态接口(禁用 v1 上传/文件/图片接口)
|
||||||
|
API_MODE=full
|
||||||
|
|
||||||
|
# 文件上传
|
||||||
|
UPLOAD_DIR=uploads
|
||||||
|
MAX_UPLOAD_SIZE=16777216 # 16MB (字节)
|
||||||
|
TEMP_DIR=temp
|
||||||
|
|
||||||
|
# 字体配置
|
||||||
|
FONTS_DIR=resource/fonts
|
||||||
|
|
||||||
|
# API 配置 (阿里云千问)
|
||||||
|
MY_API_KEY=sk-your-api-key-here
|
||||||
|
MY_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||||
|
MY_MODEL=qwen-turbo
|
||||||
|
|
||||||
|
# 分析配置
|
||||||
|
LANGUAGE_DEFAULT=zh
|
||||||
|
ANALYSIS_TIMEOUT=300
|
||||||
|
MAX_MEMORY_MB=500
|
||||||
|
|
||||||
|
# v2 (OSS URL) 安全配置
|
||||||
|
# V2_ALLOWED_HOSTS=oss.example.com,oss-cn-hangzhou.aliyuncs.com
|
||||||
|
# V2_ALLOW_HTTP=False
|
||||||
|
# V2_ALLOW_PRIVATE_NETWORKS=False
|
||||||
|
# V2_CONNECT_TIMEOUT_SECONDS=5
|
||||||
|
# V2_DOWNLOAD_TIMEOUT_SECONDS=30
|
||||||
|
|
||||||
|
# 日志
|
||||||
|
LOG_LEVEL=INFO
|
||||||
|
LOG_DIR=logs
|
||||||
22
.gitignore
vendored
Normal file
22
.gitignore
vendored
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
**/.DS_Store
|
||||||
|
|
||||||
|
.venv/
|
||||||
|
**/__pycache__/
|
||||||
|
**/*.pyc
|
||||||
|
**/*.pyo
|
||||||
|
**/*.pyd
|
||||||
|
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
**/*.swp
|
||||||
|
|
||||||
|
.env
|
||||||
|
|
||||||
|
uploads/
|
||||||
|
logs/
|
||||||
|
|
||||||
|
# generated artifacts
|
||||||
|
test/results/
|
||||||
|
*.log
|
||||||
|
temp/
|
||||||
|
test/
|
||||||
26
1.md
Normal file
26
1.md
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
是的,完全正确。
|
||||||
|
|
||||||
|
简单总结就是 “三步走”:
|
||||||
|
|
||||||
|
1. 进目录
|
||||||
|
打开 WSL 终端,进入项目文件夹:
|
||||||
|
|
||||||
|
Bash
|
||||||
|
|
||||||
|
cd /mnt/h/vs_code/Python-Server
|
||||||
|
|
||||||
|
2. 激活环境
|
||||||
|
让终端进入 Python 虚拟环境(看到前面有 (.venv) 就算成功):
|
||||||
|
|
||||||
|
Bash
|
||||||
|
|
||||||
|
source .venv/bin/activate
|
||||||
|
|
||||||
|
3. 跑起来
|
||||||
|
启动服务(记得加 --host 0.0.0.0 方便 Windows 访问):
|
||||||
|
|
||||||
|
Bash
|
||||||
|
|
||||||
|
uvicorn app.main:app --host 0.0.0.0 --port 60201 --reload
|
||||||
|
|
||||||
|
然后就可以去浏览器访问 http://localhost:60201/docs 了。祝开发顺利!
|
||||||
271
DEPLOYMENT.md
Normal file
271
DEPLOYMENT.md
Normal file
@ -0,0 +1,271 @@
|
|||||||
|
# FastAPI 应用生产部署说明
|
||||||
|
|
||||||
|
## 快速开始
|
||||||
|
|
||||||
|
### 1. 环境要求
|
||||||
|
- Python 3.10+
|
||||||
|
- Linux / macOS / Windows
|
||||||
|
- 20GB 磁盘空间(用于字体和数据)
|
||||||
|
|
||||||
|
### 2. 一键安装和启动
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 首次运行,会自动创建虚拟环境和安装依赖
|
||||||
|
bash run.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Docker 部署
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 构建 Docker 镜像
|
||||||
|
docker build -t lazy-fjh:latest .
|
||||||
|
|
||||||
|
# 运行容器
|
||||||
|
docker run -d \
|
||||||
|
-p 60201:60201 \
|
||||||
|
-v $(pwd)/uploads:/opt/lazy_fjh/uploads \
|
||||||
|
-v $(pwd)/logs:/opt/lazy_fjh/logs \
|
||||||
|
-e MY_API_KEY=sk-your-key \
|
||||||
|
lazy-fjh:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Systemd 部署 (Linux)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 复制应用到系统目录
|
||||||
|
sudo cp -r /path/to/lazy_fjh /opt/
|
||||||
|
|
||||||
|
# 更新权限
|
||||||
|
sudo chown -R www-data:www-data /opt/lazy_fjh
|
||||||
|
|
||||||
|
# 安装 systemd 服务
|
||||||
|
sudo cp /opt/lazy_fjh/deploy/systemd/lazy-fjh.service /etc/systemd/system/
|
||||||
|
|
||||||
|
# 启用并启动服务
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl enable lazy-fjh
|
||||||
|
sudo systemctl start lazy-fjh
|
||||||
|
|
||||||
|
# 检查状态
|
||||||
|
sudo systemctl status lazy-fjh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Gunicorn 部署
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 激活虚拟环境
|
||||||
|
source .venv/bin/activate
|
||||||
|
|
||||||
|
# 使用 gunicorn 启动
|
||||||
|
gunicorn -c deploy/gunicorn_config.py main:app
|
||||||
|
```
|
||||||
|
|
||||||
|
## 字体配置
|
||||||
|
|
||||||
|
### Linux 用户
|
||||||
|
|
||||||
|
首先安装系统字体:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash deploy/install_fonts.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
或手动安装:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Ubuntu/Debian
|
||||||
|
sudo apt-get install -y fonts-wqy-microhei fonts-noto-cjk-extra
|
||||||
|
|
||||||
|
# CentOS/RHEL
|
||||||
|
sudo yum install -y wqy-microhei
|
||||||
|
|
||||||
|
# Arch Linux
|
||||||
|
sudo pacman -S --noconfirm wqy-microhei ttf-noto-sans-cjk
|
||||||
|
```
|
||||||
|
|
||||||
|
### macOS 用户
|
||||||
|
|
||||||
|
```bash
|
||||||
|
brew install --cask font-noto-sans-cjk
|
||||||
|
```
|
||||||
|
|
||||||
|
### Windows 用户
|
||||||
|
|
||||||
|
从 https://www.noto-fonts.cn 下载 Noto Sans CJK 并安装
|
||||||
|
|
||||||
|
## 环境变量配置
|
||||||
|
|
||||||
|
复制 `.env.example` 为 `.env` 并填入实际值:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
编辑 `.env` 文件:
|
||||||
|
|
||||||
|
```env
|
||||||
|
# 环境
|
||||||
|
ENV=production
|
||||||
|
DEBUG=False
|
||||||
|
|
||||||
|
# 服务器
|
||||||
|
HOST=0.0.0.0
|
||||||
|
PORT=60201
|
||||||
|
|
||||||
|
# API 密钥 (阿里云千问)
|
||||||
|
MY_API_KEY=sk-your-api-key-here
|
||||||
|
MY_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||||
|
MY_MODEL=qwen-turbo
|
||||||
|
```
|
||||||
|
|
||||||
|
## 文件存储
|
||||||
|
|
||||||
|
### 上传目录
|
||||||
|
|
||||||
|
- **默认**: `./uploads/`
|
||||||
|
- **配置**: 设置 `UPLOAD_DIR` 环境变量
|
||||||
|
|
||||||
|
### 日志目录
|
||||||
|
|
||||||
|
- **默认**: `./logs/`
|
||||||
|
- **配置**: 设置 `LOG_DIR` 环境变量
|
||||||
|
|
||||||
|
## API 文档
|
||||||
|
|
||||||
|
启动应用后访问:
|
||||||
|
|
||||||
|
- **Swagger UI**: http://localhost:60201/docs
|
||||||
|
- **ReDoc**: http://localhost:60201/redoc
|
||||||
|
- **OpenAPI**: http://localhost:60201/openapi.json
|
||||||
|
|
||||||
|
## 常见问题
|
||||||
|
|
||||||
|
### 1. 字体显示为方块
|
||||||
|
|
||||||
|
**原因**: 系统未安装中文字体
|
||||||
|
|
||||||
|
**解决**:
|
||||||
|
```bash
|
||||||
|
bash deploy/install_fonts.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. 内存占用过高
|
||||||
|
|
||||||
|
**原因**: 处理大型数据集时内存使用增多
|
||||||
|
|
||||||
|
**解决**:
|
||||||
|
- 调整 `MAX_MEMORY_MB` 环境变量
|
||||||
|
- 分批处理数据
|
||||||
|
- 增加服务器内存
|
||||||
|
|
||||||
|
### 3. 上传文件超时
|
||||||
|
|
||||||
|
**原因**: 文件过大或网络问题
|
||||||
|
|
||||||
|
**解决**:
|
||||||
|
- 检查 `MAX_UPLOAD_SIZE` 限制
|
||||||
|
- 增加 `ANALYSIS_TIMEOUT` 值
|
||||||
|
- 分割大文件
|
||||||
|
|
||||||
|
### 4. 无法访问 API
|
||||||
|
|
||||||
|
**原因**: 防火墙或端口被占用
|
||||||
|
|
||||||
|
**解决**:
|
||||||
|
```bash
|
||||||
|
# 检查端口占用
|
||||||
|
sudo lsof -i :60201
|
||||||
|
|
||||||
|
# 更改 PORT 环境变量
|
||||||
|
export PORT=8080
|
||||||
|
bash run.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## 监控和维护
|
||||||
|
|
||||||
|
### 查看日志
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 实时日志
|
||||||
|
tail -f logs/app.log
|
||||||
|
|
||||||
|
# 访问日志 (Gunicorn)
|
||||||
|
tail -f logs/access.log
|
||||||
|
```
|
||||||
|
|
||||||
|
### 系统资源监控
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 使用 top/htop 监控
|
||||||
|
htop
|
||||||
|
|
||||||
|
# 或在 Python 中
|
||||||
|
python -c "from modules.linux_adapter import LinuxAdapter; print(LinuxAdapter.get_process_info())"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 定期清理
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 清理临时文件(超过 7 天)
|
||||||
|
find ./temp -type f -mtime +7 -delete
|
||||||
|
|
||||||
|
# 清理旧上传文件(超过 30 天)
|
||||||
|
find ./uploads -type f -mtime +30 -delete
|
||||||
|
```
|
||||||
|
|
||||||
|
## 性能优化
|
||||||
|
|
||||||
|
### 1. 启用 Gzip 压缩
|
||||||
|
已默认启用,减少响应体积
|
||||||
|
|
||||||
|
### 2. 异步处理
|
||||||
|
使用异步 I/O,支持更多并发连接
|
||||||
|
|
||||||
|
### 3. 内存管理
|
||||||
|
自动监控和清理内存
|
||||||
|
|
||||||
|
### 4. 并发配置 (Gunicorn)
|
||||||
|
```
|
||||||
|
workers = cpu_count * 2 + 1
|
||||||
|
worker_connections = 1000
|
||||||
|
```
|
||||||
|
|
||||||
|
## 备份和恢复
|
||||||
|
|
||||||
|
### 备份上传的文件
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tar -czf backup-uploads-$(date +%Y%m%d).tar.gz uploads/
|
||||||
|
```
|
||||||
|
|
||||||
|
### 备份数据库 (如果使用)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# PostgreSQL
|
||||||
|
pg_dump -U user db_name > backup.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
## 更新应用
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 拉取最新代码
|
||||||
|
git pull origin main
|
||||||
|
|
||||||
|
# 重新安装依赖(如有更新)
|
||||||
|
/home/syy/.local/bin/uv pip install --upgrade -r requirements.txt
|
||||||
|
|
||||||
|
# 重启服务
|
||||||
|
systemctl restart lazy-fjh
|
||||||
|
```
|
||||||
|
|
||||||
|
## 安全建议
|
||||||
|
|
||||||
|
1. **API 密钥**: 不要在代码中硬编码,使用环境变量
|
||||||
|
2. **HTTPS**: 在生产环境使用 HTTPS,配置 SSL 证书
|
||||||
|
3. **CORS**: 根据需要限制 CORS 源
|
||||||
|
4. **速率限制**: 考虑添加 API 速率限制
|
||||||
|
5. **认证**: 为敏感端点添加身份验证
|
||||||
|
|
||||||
|
## 支持和反馈
|
||||||
|
|
||||||
|
如有问题或建议,请提交 issue 或联系技术支持。
|
||||||
35
Dockerfile
Normal file
35
Dockerfile
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
# 安装系统依赖
|
||||||
|
RUN apt-get update && apt-get install -y \
|
||||||
|
fonts-wqy-microhei \
|
||||||
|
fonts-noto-cjk \
|
||||||
|
fonts-liberation \
|
||||||
|
fonts-dejavu \
|
||||||
|
libgomp1 \
|
||||||
|
libsm6 \
|
||||||
|
libxext6 \
|
||||||
|
libxrender-dev \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# 设置工作目录
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# 复制项目文件
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
# 安装 Python 依赖
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
# 创建必要的目录
|
||||||
|
RUN mkdir -p uploads logs temp resource/fonts
|
||||||
|
|
||||||
|
# 暴露端口
|
||||||
|
EXPOSE 60201
|
||||||
|
|
||||||
|
# 健康检查
|
||||||
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||||
|
CMD curl -f http://localhost:60201/health || exit 1
|
||||||
|
|
||||||
|
# 启动命令
|
||||||
|
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "60201"]
|
||||||
127
README.md
Normal file
127
README.md
Normal file
@ -0,0 +1,127 @@
|
|||||||
|
# Lazy Stat(FastAPI 时间序列分析后端)
|
||||||
|
|
||||||
|
基于 FastAPI 的时间序列数据分析服务:上传 CSV → 运行多种统计/时序/多变量分析 → 返回结构化结果(包含 `steps[]` 明细,便于前端渲染与调试)。
|
||||||
|
|
||||||
|
## 功能概览
|
||||||
|
|
||||||
|
- 15+ 分析步骤:统计概览、时间序列分析、ACF/PACF、平稳性、正态性、季节分解、频谱、相关性、PCA、聚类、因子分析、协整检验、VAR 等
|
||||||
|
- 统一输出结构:每一步包含 `summary` + `data/columns`(或 dict 结果),且保证 JSON 可序列化
|
||||||
|
- 可选绘图:通过 `generate_plots` 控制是否生成图片,并通过文件接口访问
|
||||||
|
|
||||||
|
## 快速开始(本地)
|
||||||
|
|
||||||
|
一键启动(使用 `uv` 管理虚拟环境/依赖):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash run.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
启动后访问:
|
||||||
|
|
||||||
|
- Swagger: `http://localhost:60201/docs`
|
||||||
|
- ReDoc: `http://localhost:60201/redoc`
|
||||||
|
- Health: `http://localhost:60201/health`
|
||||||
|
|
||||||
|
服务入口为 `app.main:app`(见 [app/main.py](app/main.py))。
|
||||||
|
|
||||||
|
## Docker / Compose
|
||||||
|
|
||||||
|
使用 Compose:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose up --build
|
||||||
|
```
|
||||||
|
|
||||||
|
Compose 配置见 [docker-compose.yml](docker-compose.yml)。
|
||||||
|
|
||||||
|
## 环境变量
|
||||||
|
|
||||||
|
示例文件见 [.env.example](.env.example)。常用变量:
|
||||||
|
|
||||||
|
- `HOST` / `PORT`:监听地址与端口(默认 `0.0.0.0:60201`)
|
||||||
|
- `ENV` / `DEBUG`:运行环境
|
||||||
|
- `MAX_MEMORY_MB`:内存阈值(超过会触发 gc)
|
||||||
|
- `ANALYSIS_TIMEOUT`:分析超时(如有)
|
||||||
|
- `MY_API_KEY`:外部大模型 API Key
|
||||||
|
|
||||||
|
开发/冒烟测试如果不希望调用外部大模型,可设置:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export MY_API_KEY=simulation-mode
|
||||||
|
```
|
||||||
|
|
||||||
|
如果希望仅开放 v2(OSS URL)分析接口、禁用 v1 上传/文件/图片接口,可设置:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export API_MODE=v2
|
||||||
|
```
|
||||||
|
|
||||||
|
## API 使用
|
||||||
|
|
||||||
|
所有 API 都挂在 `/api` 前缀下。
|
||||||
|
|
||||||
|
### 1) 上传 CSV
|
||||||
|
|
||||||
|
`POST /api/upload`(当前实现仅支持 CSV):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -F "file=@test/comprehensive_test_data.csv" \
|
||||||
|
-F "task_description=demo" \
|
||||||
|
http://localhost:60201/api/upload
|
||||||
|
```
|
||||||
|
|
||||||
|
返回会给出 `filename`(服务端保存后的文件名),后续分析时使用它。
|
||||||
|
|
||||||
|
### 2) 运行分析
|
||||||
|
|
||||||
|
`POST /api/analyze`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"filename": "<upload 返回的 filename>",
|
||||||
|
"task_description": "demo",
|
||||||
|
"language": "zh",
|
||||||
|
"generate_plots": false
|
||||||
|
}' \
|
||||||
|
http://localhost:60201/api/analyze
|
||||||
|
```
|
||||||
|
|
||||||
|
响应结构要点:
|
||||||
|
|
||||||
|
- `meta`: 文件名、语言、是否绘图、创建时间等
|
||||||
|
- `analysis.<lang>.steps[]`: 每个分析步骤的结构化结果(`key/title/summary/data/columns/api_analysis` 等)
|
||||||
|
- `images`: 当 `generate_plots=true` 时包含图片文件名;可用 `GET /api/image/{filename}` 获取
|
||||||
|
|
||||||
|
### 2.1) v2:从 OSS URL 分析(推荐)
|
||||||
|
|
||||||
|
`POST /api/v2/analyze`:传入 `oss_url`,后端会下载到临时文件分析并返回结构化 `steps[]`;默认不产图(你也可以传 `generate_plots=true` 以保持与 v1 同能力)。
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"oss_url": "https://<your-oss-presigned-url>",
|
||||||
|
"task_description": "demo",
|
||||||
|
"language": "zh",
|
||||||
|
"generate_plots": false
|
||||||
|
}' \
|
||||||
|
http://localhost:60201/api/v2/analyze
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3) 其他接口
|
||||||
|
|
||||||
|
- `GET /api/available_methods`:列出可用分析方法
|
||||||
|
- `GET /api/list_uploads`:列出 uploads 文件
|
||||||
|
- `GET /api/download/{filename}`:下载文件
|
||||||
|
|
||||||
|
## 生成“完整文本输出”(用于调试/验收)
|
||||||
|
|
||||||
|
脚本 [run_analysis_on_test_data.py](run_analysis_on_test_data.py) 会对测试数据跑完整流程,并把每一步的 `summary + details` 输出到 `test/results/*.txt`,适合检查 p 值、数组、DataFrame 等完整信息:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 run_analysis_on_test_data.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## 部署
|
||||||
|
|
||||||
|
生产部署说明见 [DEPLOYMENT.md](DEPLOYMENT.md)。
|
||||||
0
app/__init__.py
Normal file
0
app/__init__.py
Normal file
0
app/api/__init__.py
Normal file
0
app/api/__init__.py
Normal file
5
app/api/routes/__init__.py
Normal file
5
app/api/routes/__init__.py
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
"""
|
||||||
|
路由包初始化
|
||||||
|
"""
|
||||||
|
|
||||||
|
__all__ = ['upload', 'analysis', 'analysis_v2', 'files']
|
||||||
189
app/api/routes/analysis.py
Normal file
189
app/api/routes/analysis.py
Normal file
@ -0,0 +1,189 @@
|
|||||||
|
"""
|
||||||
|
分析路由
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Optional, Dict, Any, List
|
||||||
|
|
||||||
|
from fastapi import APIRouter, HTTPException, status, BackgroundTasks
|
||||||
|
from pydantic import BaseModel
|
||||||
|
import psutil
|
||||||
|
import os
|
||||||
|
import gc
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
from app.core.config import settings
|
||||||
|
from app.services.analysis import TimeSeriesAnalysisSystem
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
class AnalysisRequest(BaseModel):
|
||||||
|
"""分析请求模型"""
|
||||||
|
filename: str
|
||||||
|
file_type: str = "csv"
|
||||||
|
task_description: str = "时间序列数据分析"
|
||||||
|
data_background: Dict[str, Any] = {}
|
||||||
|
original_image: Optional[str] = None
|
||||||
|
language: str = "zh"
|
||||||
|
generate_plots: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/available_methods", summary="获取可用的分析方法")
|
||||||
|
async def get_available_methods() -> dict:
|
||||||
|
"""获取所有可用的分析方法"""
|
||||||
|
return {
|
||||||
|
"success": True,
|
||||||
|
"methods": {
|
||||||
|
'statistical_overview': {'name': '统计概览', 'description': '生成数据的基本统计信息和分布图表'},
|
||||||
|
'time_series_analysis': {'name': '时间序列分析', 'description': '分析变量随时间变化的趋势和模式'},
|
||||||
|
'acf_pacf_analysis': {'name': '自相关分析', 'description': '生成自相关和偏自相关函数图'},
|
||||||
|
'stationarity_tests': {'name': '平稳性检验', 'description': '执行ADF、KPSS等平稳性检验'},
|
||||||
|
'normality_tests': {'name': '正态性检验', 'description': '执行Shapiro-Wilk、Jarque-Bera正态性检验'},
|
||||||
|
'seasonal_decomposition': {'name': '季节性分解', 'description': '分解时间序列的趋势、季节和残差成分'},
|
||||||
|
'spectral_analysis': {'name': '频谱分析', 'description': '分析时间序列的频域特征'},
|
||||||
|
'correlation_analysis': {'name': '相关性分析', 'description': '计算变量间的相关性并生成热力图'},
|
||||||
|
'pca_scree_plot': {'name': 'PCA碎石图', 'description': '显示主成分分析的解释方差'},
|
||||||
|
'pca_analysis': {'name': '主成分分析', 'description': '降维分析,识别数据的主要变化方向'},
|
||||||
|
'feature_importance': {'name': '特征重要性', 'description': '分析各变量对目标预测的重要性'},
|
||||||
|
'clustering_analysis': {'name': '聚类分析', 'description': '将数据点分组为具有相似特征的簇'},
|
||||||
|
'factor_analysis': {'name': '因子分析', 'description': '识别潜在的因子结构'},
|
||||||
|
'cointegration_test': {'name': '协整检验', 'description': '检验时间序列变量间的长期均衡关系'},
|
||||||
|
'var_analysis': {'name': '向量自回归', 'description': '多变量时间序列建模和预测'}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def check_memory():
|
||||||
|
"""检查内存使用"""
|
||||||
|
process = psutil.Process(os.getpid())
|
||||||
|
memory_mb = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.info(f"当前内存使用: {memory_mb:.2f} MB")
|
||||||
|
|
||||||
|
if memory_mb > settings.MAX_MEMORY_MB:
|
||||||
|
logger.warning(f"内存使用超过阈值 ({settings.MAX_MEMORY_MB} MB),执行垃圾回收")
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/analyze", summary="执行完整分析")
|
||||||
|
async def analyze_data(request: AnalysisRequest, background_tasks: BackgroundTasks) -> dict:
|
||||||
|
"""
|
||||||
|
执行完整的时间序列分析
|
||||||
|
|
||||||
|
流程:
|
||||||
|
1. 加载并预处理数据
|
||||||
|
2. 执行15种分析方法
|
||||||
|
3. 调用AI API 进行深度分析
|
||||||
|
4. 生成PDF/PPT/HTML报告
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
logger.info("=" * 60)
|
||||||
|
logger.info(f"开始分析: {request.filename}")
|
||||||
|
logger.info(f"任务: {request.task_description}")
|
||||||
|
logger.info(f"语言: {request.language}")
|
||||||
|
logger.info("=" * 60)
|
||||||
|
|
||||||
|
# 检查内存
|
||||||
|
check_memory()
|
||||||
|
|
||||||
|
# 检查文件存在
|
||||||
|
file_path = settings.get_upload_path(request.filename)
|
||||||
|
if not file_path.exists():
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_404_NOT_FOUND,
|
||||||
|
detail=f"文件未找到: {request.filename}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 语言处理:支持 zh/en,其他值回退为 zh
|
||||||
|
lang_key = request.language if request.language in {"zh", "en"} else "zh"
|
||||||
|
|
||||||
|
# charts 模式下强制不生成图片,即使请求传了 generate_plots=true
|
||||||
|
generate_plots = False
|
||||||
|
if request.generate_plots:
|
||||||
|
logger.info("generate_plots requested true, forcing false to skip image generation")
|
||||||
|
|
||||||
|
# 创建分析器实例
|
||||||
|
logger.info(f"初始化分析器 ({lang_key})...")
|
||||||
|
analyzer = TimeSeriesAnalysisSystem(
|
||||||
|
str(file_path),
|
||||||
|
request.task_description,
|
||||||
|
data_background=request.data_background,
|
||||||
|
language=lang_key,
|
||||||
|
generate_plots=generate_plots
|
||||||
|
)
|
||||||
|
|
||||||
|
# 运行分析
|
||||||
|
logger.info("执行分析...")
|
||||||
|
results_zh, log_zh = analyzer.run_analysis()
|
||||||
|
|
||||||
|
if results_zh is None:
|
||||||
|
logger.error("中文分析失败")
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||||
|
detail="分析失败"
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info("中文分析完成")
|
||||||
|
|
||||||
|
# 准备返回数据
|
||||||
|
response_data = {
|
||||||
|
"success": True,
|
||||||
|
"meta": {
|
||||||
|
"filename": request.filename,
|
||||||
|
"task_description": request.task_description,
|
||||||
|
"language": lang_key,
|
||||||
|
"generate_plots": generate_plots,
|
||||||
|
"created_at": datetime.now().isoformat(),
|
||||||
|
},
|
||||||
|
"analysis": {
|
||||||
|
lang_key: {
|
||||||
|
"pdf_filename": None,
|
||||||
|
"ppt_filename": None,
|
||||||
|
"data_description": results_zh.get("data_description"),
|
||||||
|
"preprocessing_steps": results_zh.get("preprocessing_steps", []),
|
||||||
|
"api_analysis": results_zh.get("api_analysis", {}),
|
||||||
|
"steps": results_zh.get("steps", []),
|
||||||
|
"charts": results_zh.get("charts", {}),
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"images": {},
|
||||||
|
"log": log_zh[-20:] if log_zh else [],
|
||||||
|
"original_image": request.original_image if request.file_type == 'image' else None,
|
||||||
|
}
|
||||||
|
|
||||||
|
# 兼容旧前端:始终提供 analysis.zh
|
||||||
|
if lang_key != "zh":
|
||||||
|
response_data["analysis"]["zh"] = response_data["analysis"][lang_key]
|
||||||
|
|
||||||
|
analysis_bucket = response_data["analysis"][lang_key]
|
||||||
|
|
||||||
|
# 去除任何遗留的 image_path(兼容旧结构)
|
||||||
|
steps = analysis_bucket.get("steps")
|
||||||
|
if isinstance(steps, list):
|
||||||
|
for step in steps:
|
||||||
|
if isinstance(step, dict) and "image_path" in step:
|
||||||
|
step.pop("image_path", None)
|
||||||
|
|
||||||
|
# images 保持为空兼容旧前端
|
||||||
|
response_data["images"] = {}
|
||||||
|
|
||||||
|
logger.info("分析完成")
|
||||||
|
return response_data
|
||||||
|
|
||||||
|
except HTTPException:
|
||||||
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"分析异常: {str(e)}", exc_info=True)
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||||
|
detail=str(e)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
191
app/api/routes/analysis_v2.py
Normal file
191
app/api/routes/analysis_v2.py
Normal file
@ -0,0 +1,191 @@
|
|||||||
|
"""v2 analysis route: analyze CSV from OSS/URL.
|
||||||
|
|
||||||
|
Design goals:
|
||||||
|
- Keep v1 endpoints unchanged
|
||||||
|
- Provide the same response shape as v1, but with URL as input
|
||||||
|
- Avoid leaking server local paths
|
||||||
|
"""
|
||||||
|
|
||||||
|
import gc
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
import psutil
|
||||||
|
from fastapi import APIRouter, BackgroundTasks, HTTPException, status
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from app.core.config import settings
|
||||||
|
from app.services.analysis import TimeSeriesAnalysisSystem
|
||||||
|
from app.services.oss_csv_source import UrlValidationError, download_csv_to_tempfile
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
class AnalysisV2Request(BaseModel):
|
||||||
|
"""v2 分析请求模型(输入为 OSS/URL)"""
|
||||||
|
|
||||||
|
oss_url: str
|
||||||
|
task_description: str = "时间序列数据分析"
|
||||||
|
data_background: Dict[str, Any] = {}
|
||||||
|
language: str = "zh"
|
||||||
|
generate_plots: bool = False
|
||||||
|
source_name: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/available_methods", summary="获取可用的分析方法(v2)")
|
||||||
|
async def get_available_methods_v2() -> dict:
|
||||||
|
"""v2 版本:返回与 v1 相同的可用分析方法列表。"""
|
||||||
|
|
||||||
|
return {
|
||||||
|
"success": True,
|
||||||
|
"methods": {
|
||||||
|
"statistical_overview": {"name": "统计概览", "description": "生成数据的基本统计信息和分布图表"},
|
||||||
|
"time_series_analysis": {"name": "时间序列分析", "description": "分析变量随时间变化的趋势和模式"},
|
||||||
|
"acf_pacf_analysis": {"name": "自相关分析", "description": "生成自相关和偏自相关函数图"},
|
||||||
|
"stationarity_tests": {"name": "平稳性检验", "description": "执行ADF、KPSS等平稳性检验"},
|
||||||
|
"normality_tests": {"name": "正态性检验", "description": "执行Shapiro-Wilk、Jarque-Bera正态性检验"},
|
||||||
|
"seasonal_decomposition": {"name": "季节性分解", "description": "分解时间序列的趋势、季节和残差成分"},
|
||||||
|
"spectral_analysis": {"name": "频谱分析", "description": "分析时间序列的频域特征"},
|
||||||
|
"correlation_analysis": {"name": "相关性分析", "description": "计算变量间的相关性并生成热力图"},
|
||||||
|
"pca_scree_plot": {"name": "PCA碎石图", "description": "显示主成分分析的解释方差"},
|
||||||
|
"pca_analysis": {"name": "主成分分析", "description": "降维分析,识别数据的主要变化方向"},
|
||||||
|
"feature_importance": {"name": "特征重要性", "description": "分析各变量对目标预测的重要性"},
|
||||||
|
"clustering_analysis": {"name": "聚类分析", "description": "将数据点分组为具有相似特征的簇"},
|
||||||
|
"factor_analysis": {"name": "因子分析", "description": "识别潜在的因子结构"},
|
||||||
|
"cointegration_test": {"name": "协整检验", "description": "检验时间序列变量间的长期均衡关系"},
|
||||||
|
"var_analysis": {"name": "向量自回归", "description": "多变量时间序列建模和预测"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def check_memory():
|
||||||
|
"""检查内存使用"""
|
||||||
|
|
||||||
|
process = psutil.Process(os.getpid())
|
||||||
|
memory_mb = process.memory_info().rss / 1024 / 1024
|
||||||
|
logger.info(f"当前内存使用: {memory_mb:.2f} MB")
|
||||||
|
|
||||||
|
if memory_mb > settings.MAX_MEMORY_MB:
|
||||||
|
logger.warning(f"内存使用超过阈值 ({settings.MAX_MEMORY_MB} MB),执行垃圾回收")
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/analyze", summary="执行完整分析(v2:从 OSS URL 读取 CSV)")
|
||||||
|
async def analyze_data_v2(request: AnalysisV2Request, background_tasks: BackgroundTasks) -> dict:
|
||||||
|
"""Analyze CSV from an OSS/URL, returning the same structure as v1."""
|
||||||
|
|
||||||
|
downloaded = None
|
||||||
|
|
||||||
|
try:
|
||||||
|
logger.info("=" * 60)
|
||||||
|
logger.info("开始分析 (v2)")
|
||||||
|
logger.info(f"URL host: {request.oss_url}")
|
||||||
|
logger.info(f"任务: {request.task_description}")
|
||||||
|
logger.info(f"语言: {request.language}")
|
||||||
|
logger.info("=" * 60)
|
||||||
|
|
||||||
|
check_memory()
|
||||||
|
|
||||||
|
# 语言处理:支持 zh/en,其他值回退为 zh
|
||||||
|
lang_key = request.language if request.language in {"zh", "en"} else "zh"
|
||||||
|
|
||||||
|
# charts 模式下强制不生成图片,即使请求传了 generate_plots=true
|
||||||
|
generate_plots = False
|
||||||
|
if request.generate_plots:
|
||||||
|
logger.info("generate_plots requested true, forcing false to skip image generation")
|
||||||
|
|
||||||
|
# 下载到临时文件
|
||||||
|
try:
|
||||||
|
downloaded = download_csv_to_tempfile(request.oss_url, suffix=".csv")
|
||||||
|
except UrlValidationError as e:
|
||||||
|
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
|
||||||
|
|
||||||
|
filename_for_meta = request.source_name or downloaded.source_name
|
||||||
|
|
||||||
|
# 创建分析器实例(复用原有分析系统)
|
||||||
|
analyzer = TimeSeriesAnalysisSystem(
|
||||||
|
downloaded.local_path,
|
||||||
|
request.task_description,
|
||||||
|
data_background=request.data_background,
|
||||||
|
language=lang_key,
|
||||||
|
generate_plots=generate_plots,
|
||||||
|
)
|
||||||
|
|
||||||
|
# 运行分析
|
||||||
|
logger.info("执行分析...")
|
||||||
|
results, log_entries = analyzer.run_analysis()
|
||||||
|
|
||||||
|
if results is None:
|
||||||
|
logger.error("分析失败")
|
||||||
|
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="分析失败")
|
||||||
|
|
||||||
|
# 准备返回数据(尽量与 v1 保持一致)
|
||||||
|
response_data = {
|
||||||
|
"success": True,
|
||||||
|
"meta": {
|
||||||
|
"filename": filename_for_meta,
|
||||||
|
"task_description": request.task_description,
|
||||||
|
"language": lang_key,
|
||||||
|
"generate_plots": generate_plots,
|
||||||
|
"created_at": datetime.now().isoformat(),
|
||||||
|
"version": "v2",
|
||||||
|
"source": {
|
||||||
|
"type": "oss_url",
|
||||||
|
"host": downloaded.source_host,
|
||||||
|
"name": filename_for_meta,
|
||||||
|
"etag": downloaded.etag,
|
||||||
|
"last_modified": downloaded.last_modified,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"analysis": {
|
||||||
|
lang_key: {
|
||||||
|
"pdf_filename": None,
|
||||||
|
"ppt_filename": None,
|
||||||
|
"data_description": results.get("data_description"),
|
||||||
|
"preprocessing_steps": results.get("preprocessing_steps", []),
|
||||||
|
"api_analysis": results.get("api_analysis", {}),
|
||||||
|
"steps": results.get("steps", []),
|
||||||
|
"charts": results.get("charts", {}),
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"images": {},
|
||||||
|
"log": log_entries[-20:] if log_entries else [],
|
||||||
|
"original_image": None,
|
||||||
|
}
|
||||||
|
|
||||||
|
# 兼容旧前端:始终提供 analysis.zh
|
||||||
|
if lang_key != "zh":
|
||||||
|
response_data["analysis"]["zh"] = response_data["analysis"][lang_key]
|
||||||
|
|
||||||
|
analysis_bucket = response_data["analysis"][lang_key]
|
||||||
|
|
||||||
|
# 确保不暴露本地路径,steps chart 引用即可
|
||||||
|
steps = analysis_bucket.get("steps")
|
||||||
|
if isinstance(steps, list):
|
||||||
|
for step in steps:
|
||||||
|
if isinstance(step, dict) and "image_path" in step:
|
||||||
|
step.pop("image_path", None)
|
||||||
|
|
||||||
|
# images 保留为空兼容旧前端
|
||||||
|
response_data["images"] = {}
|
||||||
|
|
||||||
|
logger.info("分析完成 (v2)")
|
||||||
|
return response_data
|
||||||
|
|
||||||
|
except HTTPException:
|
||||||
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"分析异常 (v2): {str(e)}", exc_info=True)
|
||||||
|
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=str(e))
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# 清理临时文件
|
||||||
|
if downloaded is not None:
|
||||||
|
try:
|
||||||
|
os.unlink(downloaded.local_path)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
115
app/api/routes/files.py
Normal file
115
app/api/routes/files.py
Normal file
@ -0,0 +1,115 @@
|
|||||||
|
"""
|
||||||
|
文件服务路由 (图片、下载等)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from fastapi import APIRouter, HTTPException, status
|
||||||
|
from fastapi.responses import FileResponse
|
||||||
|
|
||||||
|
from app.core.config import settings
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/image/{filename}", summary="获取图片文件")
|
||||||
|
async def serve_image(filename: str):
|
||||||
|
"""
|
||||||
|
获取可视化图片文件
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
file_path = settings.get_upload_path(filename)
|
||||||
|
|
||||||
|
if not file_path.exists():
|
||||||
|
logger.error(f"图片未找到: {filename}")
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_404_NOT_FOUND,
|
||||||
|
detail="图片未找到"
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"提供图片: {filename}")
|
||||||
|
return FileResponse(
|
||||||
|
path=str(file_path),
|
||||||
|
media_type='image/png',
|
||||||
|
filename=filename
|
||||||
|
)
|
||||||
|
|
||||||
|
except HTTPException:
|
||||||
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"获取图片异常: {str(e)}", exc_info=True)
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||||
|
detail=str(e)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/download/{filename}", summary="下载文件")
|
||||||
|
async def download_file(filename: str):
|
||||||
|
"""
|
||||||
|
下载报告或其他文件
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
file_path = settings.get_upload_path(filename)
|
||||||
|
|
||||||
|
if not file_path.exists():
|
||||||
|
logger.error(f"文件未找到: {filename}")
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_404_NOT_FOUND,
|
||||||
|
detail="文件未找到"
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"下载文件: {filename}")
|
||||||
|
return FileResponse(
|
||||||
|
path=str(file_path),
|
||||||
|
filename=filename,
|
||||||
|
media_type='application/octet-stream'
|
||||||
|
)
|
||||||
|
|
||||||
|
except HTTPException:
|
||||||
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"下载文件异常: {str(e)}", exc_info=True)
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||||
|
detail=str(e)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/list_uploads", summary="列出上传的文件")
|
||||||
|
async def list_uploads():
|
||||||
|
"""
|
||||||
|
列出 uploads 目录中的文件
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
uploads_dir = settings.UPLOAD_DIR
|
||||||
|
|
||||||
|
if not uploads_dir.exists():
|
||||||
|
return {
|
||||||
|
"success": True,
|
||||||
|
"files": []
|
||||||
|
}
|
||||||
|
|
||||||
|
files = []
|
||||||
|
for file_path in uploads_dir.iterdir():
|
||||||
|
if file_path.is_file():
|
||||||
|
files.append({
|
||||||
|
"name": file_path.name,
|
||||||
|
"size": file_path.stat().st_size,
|
||||||
|
"modified": file_path.stat().st_mtime
|
||||||
|
})
|
||||||
|
|
||||||
|
logger.info(f"列出 {len(files)} 个文件")
|
||||||
|
return {
|
||||||
|
"success": True,
|
||||||
|
"files": sorted(files, key=lambda x: x['modified'], reverse=True)
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"列出文件异常: {str(e)}", exc_info=True)
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||||
|
detail=str(e)
|
||||||
|
)
|
||||||
124
app/api/routes/upload.py
Normal file
124
app/api/routes/upload.py
Normal file
@ -0,0 +1,124 @@
|
|||||||
|
"""
|
||||||
|
文件上传路由
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, status
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from app.core.config import settings
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
|
||||||
|
class UploadResponse(BaseModel):
|
||||||
|
"""上传响应模型"""
|
||||||
|
success: bool
|
||||||
|
filename: str
|
||||||
|
file_type: str
|
||||||
|
original_filename: str
|
||||||
|
task_description: str
|
||||||
|
message: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class UploadImageResponse(BaseModel):
|
||||||
|
"""上传图片响应模型"""
|
||||||
|
success: bool
|
||||||
|
filename: str
|
||||||
|
file_type: str
|
||||||
|
original_filename: str
|
||||||
|
original_image: str
|
||||||
|
task_description: str
|
||||||
|
message: str
|
||||||
|
|
||||||
|
|
||||||
|
def allowed_file(filename: str) -> bool:
|
||||||
|
"""检查文件是否被允许"""
|
||||||
|
if '.' not in filename:
|
||||||
|
return False
|
||||||
|
ext = filename.rsplit('.', 1)[1].lower()
|
||||||
|
return ext in settings.ALLOWED_EXTENSIONS
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/upload", response_model=UploadResponse, summary="上传CSV或图片文件")
|
||||||
|
async def upload_file(
|
||||||
|
file: UploadFile = File(...),
|
||||||
|
task_description: str = Form(default="时间序列数据分析")
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
上传数据文件(CSV 或图片)
|
||||||
|
|
||||||
|
- **file**: CSV 或图片文件 (PNG, JPG, BMP, TIFF)
|
||||||
|
- **task_description**: 分析任务描述
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
logger.info(f"=== 上传请求开始 ===")
|
||||||
|
logger.info(f"文件名: {file.filename}")
|
||||||
|
logger.info(f"任务描述: {task_description}")
|
||||||
|
|
||||||
|
# 检查文件名
|
||||||
|
if not file.filename:
|
||||||
|
logger.error("文件名为空")
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_400_BAD_REQUEST,
|
||||||
|
detail="没有选择文件"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 检查文件类型
|
||||||
|
if not allowed_file(file.filename):
|
||||||
|
logger.error(f"不支持的文件类型: {file.filename}")
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_400_BAD_REQUEST,
|
||||||
|
detail=f"不支持的文件类型。允许的类型: {', '.join(settings.ALLOWED_EXTENSIONS)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 获取文件扩展名
|
||||||
|
file_ext = file.filename.rsplit('.', 1)[1].lower()
|
||||||
|
|
||||||
|
# 生成文件名
|
||||||
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||||
|
new_filename = f"upload_{timestamp}_{file.filename}"
|
||||||
|
|
||||||
|
# 保存文件
|
||||||
|
file_path = settings.get_upload_path(new_filename)
|
||||||
|
logger.info(f"保存文件到: {file_path}")
|
||||||
|
|
||||||
|
content = await file.read()
|
||||||
|
with open(file_path, 'wb') as f:
|
||||||
|
f.write(content)
|
||||||
|
|
||||||
|
logger.info(f"文件保存成功,大小: {len(content)} bytes")
|
||||||
|
|
||||||
|
# 处理不同的文件类型
|
||||||
|
if file_ext == 'csv':
|
||||||
|
logger.info("处理 CSV 文件")
|
||||||
|
return {
|
||||||
|
"success": True,
|
||||||
|
"filename": new_filename,
|
||||||
|
"file_type": "csv",
|
||||||
|
"original_filename": file.filename,
|
||||||
|
"task_description": task_description
|
||||||
|
}
|
||||||
|
|
||||||
|
else:
|
||||||
|
logger.warning(f"不支持的文件类型: {file_ext}")
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_400_BAD_REQUEST,
|
||||||
|
detail=f"目前只支持 CSV 文件。您上传的是: {file_ext}"
|
||||||
|
)
|
||||||
|
|
||||||
|
except HTTPException:
|
||||||
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"上传处理异常: {str(e)}", exc_info=True)
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||||
|
detail=str(e)
|
||||||
|
)
|
||||||
0
app/core/__init__.py
Normal file
0
app/core/__init__.py
Normal file
122
app/core/config.py
Normal file
122
app/core/config.py
Normal file
@ -0,0 +1,122 @@
|
|||||||
|
"""
|
||||||
|
FastAPI 应用配置管理
|
||||||
|
支持环境变量配置,生产级配置管理
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
import logging
|
||||||
|
|
||||||
|
try:
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
except Exception: # pragma: no cover
|
||||||
|
load_dotenv = None
|
||||||
|
|
||||||
|
# 项目根目录
|
||||||
|
BASE_DIR = Path(__file__).resolve().parent.parent.parent
|
||||||
|
|
||||||
|
# 加载 .env(不覆盖已存在的系统环境变量)
|
||||||
|
_dotenv_path = BASE_DIR / ".env"
|
||||||
|
if load_dotenv is not None and _dotenv_path.exists():
|
||||||
|
load_dotenv(dotenv_path=_dotenv_path, override=False)
|
||||||
|
|
||||||
|
# 环境变量
|
||||||
|
ENVIRONMENT = os.getenv('ENV', 'development')
|
||||||
|
DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
|
||||||
|
|
||||||
|
class Settings:
|
||||||
|
"""应用配置类"""
|
||||||
|
|
||||||
|
# FastAPI 基础配置
|
||||||
|
APP_TITLE = "时间序列数据分析系统"
|
||||||
|
APP_DESCRIPTION = "支持多格式数据上传、AI增强分析、多语言报告生成"
|
||||||
|
APP_VERSION = "2.0.0"
|
||||||
|
|
||||||
|
# API 暴露模式
|
||||||
|
# - full: 暴露 v1 + v2(默认)
|
||||||
|
# - v2: 仅暴露 v2 分析接口 + 基础状态接口(禁用 v1 上传/文件/图片接口)
|
||||||
|
API_MODE = os.getenv('API_MODE', 'full').strip().lower()
|
||||||
|
|
||||||
|
# 服务器配置
|
||||||
|
HOST = os.getenv('HOST', '0.0.0.0')
|
||||||
|
PORT = int(os.getenv('PORT', 60201))
|
||||||
|
RELOAD = DEBUG
|
||||||
|
|
||||||
|
# CORS 配置
|
||||||
|
CORS_ORIGINS = os.getenv('CORS_ORIGINS', '*').split(',')
|
||||||
|
CORS_ALLOW_CREDENTIALS = True
|
||||||
|
CORS_ALLOW_METHODS = ['*']
|
||||||
|
CORS_ALLOW_HEADERS = ['*']
|
||||||
|
|
||||||
|
# 文件上传配置
|
||||||
|
UPLOAD_DIR = Path(os.getenv('UPLOAD_DIR', BASE_DIR / 'uploads'))
|
||||||
|
UPLOAD_DIR.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
MAX_UPLOAD_SIZE = int(os.getenv('MAX_UPLOAD_SIZE', 16 * 1024 * 1024)) # 16MB
|
||||||
|
ALLOWED_EXTENSIONS = {'csv'}
|
||||||
|
|
||||||
|
# 临时文件配置
|
||||||
|
TEMP_DIR = Path(os.getenv('TEMP_DIR', BASE_DIR / 'temp'))
|
||||||
|
TEMP_DIR.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# 字体配置
|
||||||
|
FONTS_DIR = Path(os.getenv('FONTS_DIR', BASE_DIR / 'resource' / 'fonts'))
|
||||||
|
FONTS_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# API 配置 (阿里云千问)
|
||||||
|
API_KEY = os.getenv('MY_API_KEY', '')
|
||||||
|
API_BASE = os.getenv('MY_API_BASE', 'https://dashscope.aliyuncs.com/compatible-mode/v1')
|
||||||
|
API_MODEL = os.getenv('MY_MODEL', 'qwen-turbo')
|
||||||
|
API_TIMEOUT = int(os.getenv('API_TIMEOUT', 30))
|
||||||
|
|
||||||
|
# 分析配置
|
||||||
|
LANGUAGE_DEFAULT = os.getenv('LANGUAGE_DEFAULT', 'zh')
|
||||||
|
ANALYSIS_TIMEOUT = int(os.getenv('ANALYSIS_TIMEOUT', 300)) # 5分钟
|
||||||
|
|
||||||
|
# 日志配置
|
||||||
|
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO' if not DEBUG else 'DEBUG')
|
||||||
|
LOG_DIR = Path(os.getenv('LOG_DIR', BASE_DIR / 'logs'))
|
||||||
|
LOG_DIR.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# 内存管理
|
||||||
|
MAX_MEMORY_MB = int(os.getenv('MAX_MEMORY_MB', 500))
|
||||||
|
|
||||||
|
# v2 (OSS URL) 配置
|
||||||
|
# 允许的域名白名单(逗号分隔)。为空时表示不启用域名白名单(仍会做私网/环回 IP 拦截)。
|
||||||
|
V2_ALLOWED_HOSTS = [h.strip() for h in os.getenv('V2_ALLOWED_HOSTS', '').split(',') if h.strip()]
|
||||||
|
# 是否允许 http(默认仅 https)
|
||||||
|
V2_ALLOW_HTTP = os.getenv('V2_ALLOW_HTTP', 'False').lower() == 'true'
|
||||||
|
# 是否允许私网/环回地址(仅用于本地开发/冒烟;生产建议保持 False)
|
||||||
|
V2_ALLOW_PRIVATE_NETWORKS = os.getenv('V2_ALLOW_PRIVATE_NETWORKS', 'False').lower() == 'true'
|
||||||
|
# 下载超时(秒)。requests 支持 (connect, read),这里统一使用 read 超时。
|
||||||
|
V2_DOWNLOAD_TIMEOUT_SECONDS = float(os.getenv('V2_DOWNLOAD_TIMEOUT_SECONDS', 30))
|
||||||
|
V2_CONNECT_TIMEOUT_SECONDS = float(os.getenv('V2_CONNECT_TIMEOUT_SECONDS', 5))
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_upload_path(cls, filename: str) -> Path:
|
||||||
|
"""获取上传文件的完整路径"""
|
||||||
|
return cls.UPLOAD_DIR / filename
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_temp_path(cls, filename: str) -> Path:
|
||||||
|
"""获取临时文件的完整路径"""
|
||||||
|
return cls.TEMP_DIR / filename
|
||||||
|
|
||||||
|
# 日志配置
|
||||||
|
def setup_logging():
|
||||||
|
"""设置日志系统"""
|
||||||
|
logging.basicConfig(
|
||||||
|
level=Settings.LOG_LEVEL,
|
||||||
|
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||||
|
handlers=[
|
||||||
|
logging.FileHandler(Settings.LOG_DIR / 'app.log'),
|
||||||
|
logging.StreamHandler()
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
# 创建全局配置实例
|
||||||
|
settings = Settings()
|
||||||
|
|
||||||
|
# 启用日志
|
||||||
|
setup_logging()
|
||||||
124
app/main.py
Normal file
124
app/main.py
Normal file
@ -0,0 +1,124 @@
|
|||||||
|
"""
|
||||||
|
FastAPI 应用主入口
|
||||||
|
时间序列数据分析系统 FastAPI 版本
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
from fastapi import FastAPI
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
from fastapi.middleware.gzip import GZipMiddleware
|
||||||
|
|
||||||
|
from app.core.config import settings, setup_logging, ENVIRONMENT, DEBUG
|
||||||
|
from app.services.font_manager import setup_fonts_for_app
|
||||||
|
from app.services.linux_adapter import init_linux_environment
|
||||||
|
|
||||||
|
# 设置日志
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# 应用生命周期
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
"""应用生命周期管理"""
|
||||||
|
# 启动时
|
||||||
|
logger.info("=" * 60)
|
||||||
|
logger.info(f"应用启动: {settings.APP_TITLE}")
|
||||||
|
logger.info(f"版本: {settings.APP_VERSION}")
|
||||||
|
logger.info(f"环境: {ENVIRONMENT}")
|
||||||
|
logger.info(f"调试: {DEBUG}")
|
||||||
|
logger.info(f"监听: {settings.HOST}:{settings.PORT}")
|
||||||
|
logger.info("=" * 60)
|
||||||
|
|
||||||
|
# 初始化 Linux 环境
|
||||||
|
try:
|
||||||
|
init_linux_environment()
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Linux 环境初始化失败: {e}")
|
||||||
|
|
||||||
|
# 初始化字体
|
||||||
|
try:
|
||||||
|
fonts_config = setup_fonts_for_app(['zh', 'en'])
|
||||||
|
logger.info(f"字体配置完成: {fonts_config}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"字体配置失败: {e}")
|
||||||
|
|
||||||
|
yield
|
||||||
|
|
||||||
|
# 关闭时
|
||||||
|
logger.info("应用关闭")
|
||||||
|
|
||||||
|
|
||||||
|
# 创建 FastAPI 应用
|
||||||
|
app = FastAPI(
|
||||||
|
title=settings.APP_TITLE,
|
||||||
|
description=settings.APP_DESCRIPTION,
|
||||||
|
version=settings.APP_VERSION,
|
||||||
|
lifespan=lifespan
|
||||||
|
)
|
||||||
|
|
||||||
|
# 添加中间件
|
||||||
|
# CORS 中间件
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware,
|
||||||
|
allow_origins=settings.CORS_ORIGINS,
|
||||||
|
allow_credentials=settings.CORS_ALLOW_CREDENTIALS,
|
||||||
|
allow_methods=settings.CORS_ALLOW_METHODS,
|
||||||
|
allow_headers=settings.CORS_ALLOW_HEADERS,
|
||||||
|
)
|
||||||
|
|
||||||
|
# 压缩中间件
|
||||||
|
app.add_middleware(GZipMiddleware, minimum_size=1000)
|
||||||
|
|
||||||
|
# 导入和包含路由
|
||||||
|
from app.api.routes import upload, analysis, analysis_v2, files
|
||||||
|
|
||||||
|
# v2 模式:仅暴露 v2 分析接口 + 基础状态接口
|
||||||
|
if settings.API_MODE == "v2":
|
||||||
|
logger.info("API_MODE=v2: 禁用 v1 上传/文件接口,仅启用 /api/v2")
|
||||||
|
app.include_router(analysis_v2.router, prefix="/api/v2", tags=["analysis-v2"])
|
||||||
|
else:
|
||||||
|
app.include_router(upload.router, prefix="/api", tags=["upload"])
|
||||||
|
app.include_router(analysis.router, prefix="/api", tags=["analysis"])
|
||||||
|
app.include_router(analysis_v2.router, prefix="/api/v2", tags=["analysis-v2"])
|
||||||
|
app.include_router(files.router, prefix="/api", tags=["files"])
|
||||||
|
|
||||||
|
# 根路由
|
||||||
|
@app.get("/")
|
||||||
|
async def root():
|
||||||
|
"""根路径"""
|
||||||
|
return {
|
||||||
|
"message": "Lazy Stat Backend API",
|
||||||
|
"version": settings.APP_VERSION,
|
||||||
|
"docs": "/docs"
|
||||||
|
}
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health():
|
||||||
|
"""健康检查"""
|
||||||
|
return {
|
||||||
|
"status": "healthy",
|
||||||
|
"app": settings.APP_TITLE,
|
||||||
|
"version": settings.APP_VERSION
|
||||||
|
}
|
||||||
|
|
||||||
|
@app.get("/api/config")
|
||||||
|
async def get_config():
|
||||||
|
"""获取应用配置"""
|
||||||
|
return {
|
||||||
|
"title": settings.APP_TITLE,
|
||||||
|
"version": settings.APP_VERSION,
|
||||||
|
"max_upload_size": settings.MAX_UPLOAD_SIZE,
|
||||||
|
"allowed_extensions": list(settings.ALLOWED_EXTENSIONS),
|
||||||
|
"language_default": settings.LANGUAGE_DEFAULT
|
||||||
|
}
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import uvicorn
|
||||||
|
|
||||||
|
uvicorn.run(
|
||||||
|
"app.main:app",
|
||||||
|
host=settings.HOST,
|
||||||
|
port=settings.PORT,
|
||||||
|
reload=settings.RELOAD,
|
||||||
|
log_level=settings.LOG_LEVEL.lower()
|
||||||
|
)
|
||||||
0
app/services/__init__.py
Normal file
0
app/services/__init__.py
Normal file
32
app/services/analysis/__init__.py
Normal file
32
app/services/analysis/__init__.py
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
"""Analysis package.
|
||||||
|
|
||||||
|
This package contains the refactored analysis modules.
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- The legacy entrypoint remains `app.services.analysis_system.TimeSeriesAnalysisSystem`.
|
||||||
|
- Importing `app.services.analysis_system` eagerly here would create a circular import because
|
||||||
|
`analysis_system` imports `app.services.analysis.modules.*`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Any, TYPE_CHECKING
|
||||||
|
|
||||||
|
__all__ = ["TimeSeriesAnalysisSystem"]
|
||||||
|
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from app.services.analysis_system import TimeSeriesAnalysisSystem as TimeSeriesAnalysisSystem
|
||||||
|
|
||||||
|
|
||||||
|
def __getattr__(name: str) -> Any:
|
||||||
|
if name == "TimeSeriesAnalysisSystem":
|
||||||
|
from app.services.analysis_system import TimeSeriesAnalysisSystem
|
||||||
|
|
||||||
|
return TimeSeriesAnalysisSystem
|
||||||
|
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
|
||||||
|
|
||||||
|
|
||||||
|
def __dir__() -> list[str]:
|
||||||
|
return sorted(list(globals().keys()) + __all__)
|
||||||
|
|
||||||
4
app/services/analysis/modules/__init__.py
Normal file
4
app/services/analysis/modules/__init__.py
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
"""Implementation modules for analysis methods.
|
||||||
|
|
||||||
|
Each file contains one or a small group of closely-related analysis methods.
|
||||||
|
"""
|
||||||
180
app/services/analysis/modules/basic.py
Normal file
180
app/services/analysis/modules/basic.py
Normal file
@ -0,0 +1,180 @@
|
|||||||
|
import gc
|
||||||
|
import os
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from scipy import stats
|
||||||
|
|
||||||
|
|
||||||
|
def generate_statistical_overview(self):
|
||||||
|
"""生成统计概览 - 优化内存版本"""
|
||||||
|
fig = None
|
||||||
|
try:
|
||||||
|
self._log_step("Generating statistical overview...")
|
||||||
|
|
||||||
|
# 检查数据
|
||||||
|
if not hasattr(self, 'data') or self.data is None or len(self.data) == 0:
|
||||||
|
self._log_step("No data available for statistical overview", "warning")
|
||||||
|
return None, "No data available", None
|
||||||
|
|
||||||
|
# 计算统计数据
|
||||||
|
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
|
||||||
|
stats_df = self.data[numeric_cols].describe().T.reset_index().rename(columns={'index': 'variable'})
|
||||||
|
summary = f"Generated statistical overview for {len(numeric_cols)} variables"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Statistical overview generated (data only)", "success")
|
||||||
|
return None, summary, stats_df
|
||||||
|
|
||||||
|
# 使用更小的图形尺寸和DPI来节省内存
|
||||||
|
fig, axes = plt.subplots(2, 2, figsize=(10, 8), dpi=100)
|
||||||
|
fig.suptitle('Statistical Overview', fontsize=14)
|
||||||
|
|
||||||
|
# 基本统计信息
|
||||||
|
# 只处理前4个变量以节省内存
|
||||||
|
num_vars = min(4, len(self.data.columns))
|
||||||
|
|
||||||
|
for i in range(num_vars):
|
||||||
|
row = i // 2
|
||||||
|
col = i % 2
|
||||||
|
col_name = self.data.columns[i]
|
||||||
|
|
||||||
|
try:
|
||||||
|
# 时间序列图
|
||||||
|
axes[row, col].plot(self.data.index, self.data[col_name], linewidth=1)
|
||||||
|
axes[row, col].set_title(f'{col_name}')
|
||||||
|
axes[row, col].tick_params(axis='x', rotation=45)
|
||||||
|
axes[row, col].grid(True, alpha=0.3)
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Plotting {col_name} failed: {e}", "warning")
|
||||||
|
axes[row, col].text(
|
||||||
|
0.5,
|
||||||
|
0.5,
|
||||||
|
f'Error: {str(e)[:30]}',
|
||||||
|
ha='center',
|
||||||
|
va='center',
|
||||||
|
transform=axes[row, col].transAxes,
|
||||||
|
)
|
||||||
|
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
# 保存图片(使用更低的DPI)
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'stats_overview.png')
|
||||||
|
try:
|
||||||
|
plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
|
||||||
|
if not os.path.exists(img_path):
|
||||||
|
self._log_step("Failed to save statistical overview image", "error")
|
||||||
|
return None, "Failed to save image", stats_df
|
||||||
|
except Exception as save_error:
|
||||||
|
self._log_step(f"Failed to save figure: {save_error}", "error")
|
||||||
|
return None, f"Save error: {str(save_error)[:100]}", stats_df
|
||||||
|
finally:
|
||||||
|
plt.close(fig) # 明确关闭图形释放内存
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
self._log_step("Statistical overview generated", "success")
|
||||||
|
|
||||||
|
return img_path, summary, stats_df
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Statistical overview failed: {str(e)[:100]}", "error")
|
||||||
|
if fig is not None:
|
||||||
|
try:
|
||||||
|
plt.close(fig)
|
||||||
|
gc.collect()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return None, f"Statistical overview failed: {str(e)[:100]}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_normality_tests(self):
|
||||||
|
"""执行正态性检验"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing normality tests...")
|
||||||
|
|
||||||
|
if hasattr(self, 'data') and self.data is not None:
|
||||||
|
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
for col in numeric_cols[:3]: # 只测试前3个变量
|
||||||
|
series = self.data[col].dropna()
|
||||||
|
col_results = {}
|
||||||
|
|
||||||
|
# 直方图分箱(后端负责 binning)
|
||||||
|
hist_counts, bin_edges = np.histogram(series, bins=20)
|
||||||
|
histogram = []
|
||||||
|
for i in range(len(hist_counts)):
|
||||||
|
histogram.append({
|
||||||
|
'range_start': float(bin_edges[i]),
|
||||||
|
'range_end': float(bin_edges[i + 1]),
|
||||||
|
'count': int(hist_counts[i])
|
||||||
|
})
|
||||||
|
col_results['histogram'] = histogram
|
||||||
|
|
||||||
|
# Shapiro-Wilk检验
|
||||||
|
if len(series) >= 3 and len(series) <= 5000:
|
||||||
|
shapiro_result = stats.shapiro(series)
|
||||||
|
col_results['Shapiro-Wilk'] = {
|
||||||
|
'statistic': float(shapiro_result[0]),
|
||||||
|
'p_value': float(shapiro_result[1]),
|
||||||
|
'normal': bool(shapiro_result[1] > 0.05),
|
||||||
|
}
|
||||||
|
|
||||||
|
# Jarque-Bera检验
|
||||||
|
jb_result = stats.jarque_bera(series)
|
||||||
|
# SciPy result typing varies by version; keep runtime behavior and silence stub mismatch.
|
||||||
|
jb_stat = float(jb_result[0]) # type: ignore[index,arg-type]
|
||||||
|
jb_p = float(jb_result[1]) # type: ignore[index,arg-type]
|
||||||
|
col_results['Jarque-Bera'] = {
|
||||||
|
'statistic': jb_stat,
|
||||||
|
'p_value': jb_p,
|
||||||
|
'normal': bool(jb_p > 0.05),
|
||||||
|
}
|
||||||
|
|
||||||
|
results[col] = col_results
|
||||||
|
|
||||||
|
summary = f"正态性检验完成,测试了 {len(results)} 个变量"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Normality tests completed (data only)", "success")
|
||||||
|
return None, summary, results
|
||||||
|
|
||||||
|
# 创建正态性检验可视化
|
||||||
|
n_cols = min(3, len(numeric_cols))
|
||||||
|
fig, axes = plt.subplots(n_cols, 2, figsize=(12, 4 * n_cols))
|
||||||
|
fig.suptitle('正态性检验结果', fontsize=16)
|
||||||
|
|
||||||
|
if n_cols == 1:
|
||||||
|
axes = axes.reshape(1, -1)
|
||||||
|
|
||||||
|
for i, col in enumerate(numeric_cols[:n_cols]):
|
||||||
|
series = self.data[col].dropna()
|
||||||
|
|
||||||
|
# 直方图与正态曲线
|
||||||
|
axes[i, 0].hist(series, bins=20, density=True, alpha=0.7, color='skyblue')
|
||||||
|
xmin, xmax = axes[i, 0].get_xlim()
|
||||||
|
x = np.linspace(xmin, xmax, 100)
|
||||||
|
p = stats.norm.pdf(x, series.mean(), series.std())
|
||||||
|
axes[i, 0].plot(x, p, 'k', linewidth=2)
|
||||||
|
axes[i, 0].set_title(f'{col} - 分布直方图')
|
||||||
|
|
||||||
|
# Q-Q图
|
||||||
|
stats.probplot(series, dist="norm", plot=axes[i, 1])
|
||||||
|
axes[i, 1].set_title(f'{col} - Q-Q图')
|
||||||
|
|
||||||
|
plt.tight_layout()
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'normality_tests.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("Normality tests completed", "success")
|
||||||
|
|
||||||
|
return img_path, summary, results
|
||||||
|
|
||||||
|
self._log_step("No data available for normality tests", "warning")
|
||||||
|
return None, "数据不足,无法进行正态性检验", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Normality tests failed: {e}", "error")
|
||||||
|
return None, f"正态性检验失败: {e}", None
|
||||||
112
app/services/analysis/modules/modeling.py
Normal file
112
app/services/analysis/modules/modeling.py
Normal file
@ -0,0 +1,112 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from sklearn.ensemble import RandomForestRegressor
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_feature_importance(self):
|
||||||
|
"""分析特征重要性"""
|
||||||
|
try:
|
||||||
|
self._log_step("Analyzing feature importance...")
|
||||||
|
|
||||||
|
if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
|
||||||
|
self._log_step("Not enough data for feature importance analysis", "warning")
|
||||||
|
return None, "Not enough data for feature importance analysis", None
|
||||||
|
|
||||||
|
X = self.data
|
||||||
|
y = self.data.iloc[:, 0] # 使用第一列作为目标变量
|
||||||
|
|
||||||
|
model = RandomForestRegressor(n_estimators=50, random_state=42) # 减少树的数量
|
||||||
|
model.fit(X, y)
|
||||||
|
|
||||||
|
feature_importance = pd.Series(model.feature_importances_, index=X.columns)
|
||||||
|
feature_importance = feature_importance.sort_values(ascending=False)
|
||||||
|
|
||||||
|
fi_df = feature_importance.reset_index()
|
||||||
|
fi_df.columns = ['feature', 'importance']
|
||||||
|
|
||||||
|
summary = f"Feature importance analysis completed, top feature: {fi_df.iloc[0]['feature']}"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Feature importance analysis completed (data only)", "success")
|
||||||
|
return None, summary, fi_df
|
||||||
|
|
||||||
|
plt.figure(figsize=(8, 6))
|
||||||
|
feature_importance.head(10).plot(kind='bar')
|
||||||
|
plt.title('Feature Importance Analysis')
|
||||||
|
plt.ylabel('Importance Score')
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'feature_importance.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("Feature importance analysis completed", "success")
|
||||||
|
return img_path, summary, fi_df
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Feature importance analysis failed: {e}", "error")
|
||||||
|
return None, f"Feature importance analysis failed: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_var_analysis(self):
|
||||||
|
"""执行向量自回归分析"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing VAR analysis...")
|
||||||
|
|
||||||
|
if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
|
||||||
|
self._log_step("Not enough data for VAR analysis", "warning")
|
||||||
|
return None, "数据不足,无法进行VAR分析", None
|
||||||
|
|
||||||
|
from statsmodels.tsa.api import VAR
|
||||||
|
|
||||||
|
numeric_data = self.data.select_dtypes(include=[np.number])
|
||||||
|
if len(numeric_data.columns) < 2:
|
||||||
|
self._log_step("Not enough numeric columns for VAR analysis", "warning")
|
||||||
|
return None, "数值变量不足,无法进行VAR分析", None
|
||||||
|
|
||||||
|
var_data = numeric_data.iloc[:, : min(3, len(numeric_data.columns))]
|
||||||
|
|
||||||
|
model = VAR(var_data)
|
||||||
|
results = model.fit(maxlags=2, ic='aic')
|
||||||
|
|
||||||
|
lag_order = results.k_ar
|
||||||
|
forecast = results.forecast(var_data.values[-lag_order:], steps=10)
|
||||||
|
|
||||||
|
forecast_df = pd.DataFrame(data=forecast, columns=[f"{col}_forecast" for col in var_data.columns])
|
||||||
|
summary = f"VAR分析完成,使用滞后阶数: {results.k_ar},生成了10期预测"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("VAR analysis completed (data only)", "success")
|
||||||
|
return None, summary, forecast_df
|
||||||
|
|
||||||
|
plt.figure(figsize=(12, 8))
|
||||||
|
for i, col in enumerate(var_data.columns):
|
||||||
|
plt.plot(range(len(var_data)), var_data[col].values, label=f'{col} (actual)', alpha=0.7)
|
||||||
|
plt.plot(
|
||||||
|
range(len(var_data), len(var_data) + 10),
|
||||||
|
forecast[:, i],
|
||||||
|
label=f'{col} (forecast)',
|
||||||
|
linestyle='--',
|
||||||
|
)
|
||||||
|
|
||||||
|
plt.axvline(x=len(var_data), color='red', linestyle=':', alpha=0.7, label='Forecast Start')
|
||||||
|
plt.xlabel('Time')
|
||||||
|
plt.ylabel('Value')
|
||||||
|
plt.title('Vector Autoregression (VAR) Forecast')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True, alpha=0.3)
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'var_analysis.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("VAR analysis completed", "success")
|
||||||
|
return img_path, summary, forecast_df
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"VAR analysis failed: {e}", "error")
|
||||||
|
return None, f"VAR分析失败: {e}", None
|
||||||
301
app/services/analysis/modules/multivariate.py
Normal file
301
app/services/analysis/modules/multivariate.py
Normal file
@ -0,0 +1,301 @@
|
|||||||
|
import gc
|
||||||
|
import os
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import seaborn as sns
|
||||||
|
from sklearn.decomposition import PCA
|
||||||
|
from sklearn.cluster import KMeans
|
||||||
|
|
||||||
|
|
||||||
|
def generate_correlation_heatmap(self):
|
||||||
|
"""生成相关性热力图"""
|
||||||
|
fig = None
|
||||||
|
try:
|
||||||
|
self._log_step("Generating correlation heatmap...")
|
||||||
|
|
||||||
|
if not hasattr(self, 'data') or self.data is None or len(self.data.columns) <= 1:
|
||||||
|
self._log_step("Not enough data for correlation analysis", "warning")
|
||||||
|
return None, "Not enough data", None
|
||||||
|
|
||||||
|
# 计算相关性矩阵
|
||||||
|
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
|
||||||
|
corr_matrix = self.data[numeric_cols].corr()
|
||||||
|
summary = "Correlation matrix calculated"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Correlation analysis completed (data only)", "success")
|
||||||
|
# 替换NaN为None以兼容JSON
|
||||||
|
return None, summary, corr_matrix.where(pd.notnull(corr_matrix), None)
|
||||||
|
|
||||||
|
# 创建热力图
|
||||||
|
fig = plt.figure(figsize=(8, 6), dpi=100)
|
||||||
|
sns.heatmap(
|
||||||
|
corr_matrix,
|
||||||
|
annot=True,
|
||||||
|
fmt=".2f",
|
||||||
|
cmap='coolwarm',
|
||||||
|
center=0,
|
||||||
|
square=True,
|
||||||
|
cbar_kws={"shrink": 0.8},
|
||||||
|
)
|
||||||
|
plt.title('Correlation Heatmap')
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
# 保存图片
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'correlation_heatmap.png')
|
||||||
|
try:
|
||||||
|
plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
|
||||||
|
except Exception as save_err:
|
||||||
|
self._log_step(f"Save error: {save_err}", "error")
|
||||||
|
return None, f"Save error: {str(save_err)[:100]}", corr_matrix.where(pd.notnull(corr_matrix), None)
|
||||||
|
finally:
|
||||||
|
plt.close(fig)
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
self._log_step("Correlation heatmap generated", "success")
|
||||||
|
|
||||||
|
return img_path, summary, corr_matrix.where(pd.notnull(corr_matrix), None)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Correlation heatmap failed: {str(e)[:100]}", "error")
|
||||||
|
if fig is not None:
|
||||||
|
try:
|
||||||
|
plt.close(fig)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return None, f"Correlation heatmap failed: {str(e)[:100]}", None
|
||||||
|
|
||||||
|
|
||||||
|
def generate_pca_scree_plot(self):
|
||||||
|
"""生成PCA碎石图"""
|
||||||
|
try:
|
||||||
|
self._log_step("Generating PCA scree plot...")
|
||||||
|
|
||||||
|
if hasattr(self, 'scaled_data') and self.scaled_data is not None:
|
||||||
|
pca = PCA()
|
||||||
|
pca.fit(self.scaled_data)
|
||||||
|
|
||||||
|
explained_variance = pca.explained_variance_ratio_
|
||||||
|
cumulative_variance = np.cumsum(explained_variance)
|
||||||
|
|
||||||
|
# 准备数据
|
||||||
|
scree_data = pd.DataFrame({
|
||||||
|
'component': range(1, len(explained_variance) + 1),
|
||||||
|
'explained_variance': explained_variance,
|
||||||
|
'cumulative_variance': cumulative_variance,
|
||||||
|
})
|
||||||
|
|
||||||
|
summary = (
|
||||||
|
"PCA碎石图生成完成,前2个主成分解释 "
|
||||||
|
f"{cumulative_variance[min(1, len(cumulative_variance) - 1)]:.2%} 方差"
|
||||||
|
)
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("PCA scree data generated", "success")
|
||||||
|
return None, summary, scree_data
|
||||||
|
|
||||||
|
# 创建碎石图
|
||||||
|
plt.figure(figsize=(10, 6))
|
||||||
|
|
||||||
|
# 绘制碎石图
|
||||||
|
plt.subplot(1, 2, 1)
|
||||||
|
plt.plot(range(1, len(explained_variance) + 1), explained_variance, 'bo-')
|
||||||
|
plt.title('PCA碎石图')
|
||||||
|
plt.xlabel('主成分')
|
||||||
|
plt.ylabel('解释方差比例')
|
||||||
|
plt.grid(True, alpha=0.3)
|
||||||
|
|
||||||
|
# 绘制累积方差图
|
||||||
|
plt.subplot(1, 2, 2)
|
||||||
|
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, 'ro-')
|
||||||
|
plt.title('累积解释方差')
|
||||||
|
plt.xlabel('主成分数量')
|
||||||
|
plt.ylabel('累积方差比例')
|
||||||
|
plt.axhline(y=0.85, color='g', linestyle='--', label='85% 方差')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True, alpha=0.3)
|
||||||
|
|
||||||
|
plt.tight_layout()
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'pca_scree_plot.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("PCA scree plot generated", "success")
|
||||||
|
|
||||||
|
return img_path, summary, scree_data
|
||||||
|
|
||||||
|
self._log_step("No scaled data available for PCA scree plot", "warning")
|
||||||
|
return None, "没有标准化数据可用于PCA碎石图", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"PCA scree plot failed: {e}", "error")
|
||||||
|
return None, f"PCA碎石图生成失败: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_pca_analysis(self):
|
||||||
|
"""执行主成分分析"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing PCA analysis...")
|
||||||
|
|
||||||
|
if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
|
||||||
|
pca = PCA(n_components=2)
|
||||||
|
principal_components = pca.fit_transform(self.scaled_data)
|
||||||
|
|
||||||
|
summary = (
|
||||||
|
"PCA analysis completed, explained variance: "
|
||||||
|
f"{pca.explained_variance_ratio_[0]:.2%} + {pca.explained_variance_ratio_[1]:.2%}"
|
||||||
|
)
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("PCA analysis completed (data only)", "success")
|
||||||
|
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
|
||||||
|
pca_df['timestamp'] = self.data.index.astype(str)
|
||||||
|
return None, summary, pca_df
|
||||||
|
|
||||||
|
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
|
||||||
|
pca_df['timestamp'] = self.data.index.astype(str)
|
||||||
|
|
||||||
|
# 创建PCA散点图
|
||||||
|
plt.figure(figsize=(8, 6))
|
||||||
|
plt.scatter(principal_components[:, 0], principal_components[:, 1], alpha=0.7)
|
||||||
|
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%})')
|
||||||
|
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%})')
|
||||||
|
plt.title('Principal Component Analysis (PCA)')
|
||||||
|
plt.grid(True, alpha=0.3)
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
# 保存图片
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'pca_analysis.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("PCA analysis completed", "success")
|
||||||
|
|
||||||
|
return img_path, summary, pca_df
|
||||||
|
|
||||||
|
self._log_step("Not enough data for PCA analysis", "warning")
|
||||||
|
return None, "Not enough data for PCA analysis", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"PCA analysis failed: {e}", "error")
|
||||||
|
return None, f"PCA analysis failed: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_clustering_analysis(self):
|
||||||
|
"""执行聚类分析"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing clustering analysis...")
|
||||||
|
|
||||||
|
if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
|
||||||
|
kmeans = KMeans(n_clusters=3, random_state=42)
|
||||||
|
clusters = kmeans.fit_predict(self.scaled_data)
|
||||||
|
|
||||||
|
summary = f"Clustering analysis completed, found {len(np.unique(clusters))} clusters"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Clustering analysis completed (data only)", "success")
|
||||||
|
cluster_df = pd.DataFrame({'cluster': clusters})
|
||||||
|
cluster_df['timestamp'] = self.data.index.astype(str)
|
||||||
|
return None, summary, cluster_df
|
||||||
|
|
||||||
|
cluster_df = pd.DataFrame({'cluster': clusters})
|
||||||
|
cluster_df['timestamp'] = self.data.index.astype(str)
|
||||||
|
|
||||||
|
# 如果数据是2D的,直接绘制聚类结果
|
||||||
|
if len(self.scaled_data.columns) >= 2:
|
||||||
|
plt.figure(figsize=(8, 6))
|
||||||
|
plt.scatter(
|
||||||
|
self.scaled_data.iloc[:, 0],
|
||||||
|
self.scaled_data.iloc[:, 1],
|
||||||
|
c=clusters,
|
||||||
|
cmap='viridis',
|
||||||
|
alpha=0.7,
|
||||||
|
)
|
||||||
|
plt.xlabel(self.scaled_data.columns[0])
|
||||||
|
plt.ylabel(self.scaled_data.columns[1])
|
||||||
|
plt.title('Clustering Analysis')
|
||||||
|
plt.colorbar(label='Cluster')
|
||||||
|
plt.tight_layout()
|
||||||
|
else:
|
||||||
|
# 对于高维数据,使用PCA降维后可视化
|
||||||
|
pca = PCA(n_components=2)
|
||||||
|
reduced_data = pca.fit_transform(self.scaled_data)
|
||||||
|
|
||||||
|
plt.figure(figsize=(8, 6))
|
||||||
|
plt.scatter(reduced_data[:, 0], reduced_data[:, 1], c=clusters, cmap='viridis', alpha=0.7)
|
||||||
|
plt.xlabel('PC1')
|
||||||
|
plt.ylabel('PC2')
|
||||||
|
plt.title('Clustering Analysis (PCA Reduced)')
|
||||||
|
plt.colorbar(label='Cluster')
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
# 保存图片
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'clustering_analysis.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("Clustering analysis completed", "success")
|
||||||
|
|
||||||
|
return img_path, summary, cluster_df
|
||||||
|
|
||||||
|
self._log_step("Not enough data for clustering analysis", "warning")
|
||||||
|
return None, "Not enough data for clustering analysis", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Clustering analysis failed: {e}", "error")
|
||||||
|
return None, f"Clustering analysis failed: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_factor_analysis(self):
|
||||||
|
"""执行因子分析"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing factor analysis...")
|
||||||
|
|
||||||
|
if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
|
||||||
|
from sklearn.decomposition import FactorAnalysis
|
||||||
|
|
||||||
|
fa = FactorAnalysis(n_components=2, random_state=42)
|
||||||
|
factors = fa.fit_transform(self.scaled_data)
|
||||||
|
|
||||||
|
summary = "因子分析完成,提取了2个主要因子"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Factor analysis completed (data only)", "success")
|
||||||
|
factor_df = pd.DataFrame(data=factors, columns=['Factor1', 'Factor2'])
|
||||||
|
factor_df['timestamp'] = self.data.index.astype(str)
|
||||||
|
return None, summary, factor_df
|
||||||
|
|
||||||
|
factor_df = pd.DataFrame(data=factors, columns=['Factor1', 'Factor2'])
|
||||||
|
factor_df['timestamp'] = self.data.index.astype(str)
|
||||||
|
|
||||||
|
# 创建因子分析图
|
||||||
|
plt.figure(figsize=(10, 8))
|
||||||
|
plt.scatter(factors[:, 0], factors[:, 1], alpha=0.7)
|
||||||
|
plt.xlabel('Factor 1')
|
||||||
|
plt.ylabel('Factor 2')
|
||||||
|
plt.title('Factor Analysis')
|
||||||
|
plt.grid(True, alpha=0.3)
|
||||||
|
|
||||||
|
# 添加因子载荷
|
||||||
|
for i, (x, y) in enumerate(factors[:10]): # 只显示前10个点
|
||||||
|
plt.annotate(str(i), (x, y), xytext=(5, 5), textcoords='offset points', fontsize=8)
|
||||||
|
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
# 保存图片
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'factor_analysis.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("Factor analysis completed", "success")
|
||||||
|
|
||||||
|
return img_path, summary, factor_df
|
||||||
|
|
||||||
|
self._log_step("Not enough data for factor analysis", "warning")
|
||||||
|
return None, "数据不足,无法进行因子分析", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Factor analysis failed: {e}", "error")
|
||||||
|
return None, f"因子分析失败: {e}", None
|
||||||
169
app/services/analysis/modules/stationarity.py
Normal file
169
app/services/analysis/modules/stationarity.py
Normal file
@ -0,0 +1,169 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from statsmodels.tsa.stattools import adfuller, kpss
|
||||||
|
|
||||||
|
|
||||||
|
def perform_stationarity_tests(self):
|
||||||
|
"""执行平稳性检验 - ADF, KPSS, PP检验"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing stationarity tests...")
|
||||||
|
|
||||||
|
if hasattr(self, 'data') and self.data is not None:
|
||||||
|
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
for col in numeric_cols[:3]: # 只测试前3个变量
|
||||||
|
series = self.data[col].dropna()
|
||||||
|
col_results = {}
|
||||||
|
|
||||||
|
# ADF检验
|
||||||
|
adf_result = adfuller(series)
|
||||||
|
adf_crit = adf_result[4] # type: ignore[index]
|
||||||
|
if isinstance(adf_crit, dict):
|
||||||
|
adf_crit = {str(k): float(v) for k, v in adf_crit.items()}
|
||||||
|
col_results['ADF'] = {
|
||||||
|
'statistic': float(adf_result[0]),
|
||||||
|
'p_value': float(adf_result[1]),
|
||||||
|
'critical_values': adf_crit,
|
||||||
|
'stationary': bool(adf_result[1] < 0.05),
|
||||||
|
}
|
||||||
|
|
||||||
|
# KPSS检验
|
||||||
|
try:
|
||||||
|
kpss_result = kpss(series, regression='c')
|
||||||
|
kpss_crit = kpss_result[3]
|
||||||
|
if isinstance(kpss_crit, dict):
|
||||||
|
kpss_crit = {str(k): float(v) for k, v in kpss_crit.items()}
|
||||||
|
col_results['KPSS'] = {
|
||||||
|
'statistic': float(kpss_result[0]),
|
||||||
|
'p_value': float(kpss_result[1]),
|
||||||
|
'critical_values': kpss_crit,
|
||||||
|
'stationary': bool(kpss_result[1] > 0.05),
|
||||||
|
}
|
||||||
|
except Exception:
|
||||||
|
col_results['KPSS'] = '检验失败'
|
||||||
|
|
||||||
|
results[col] = col_results
|
||||||
|
|
||||||
|
summary = f"平稳性检验完成,测试了 {len(results)} 个变量"
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Stationarity tests completed (data only)", "success")
|
||||||
|
return None, summary, results
|
||||||
|
|
||||||
|
# 创建平稳性检验可视化
|
||||||
|
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
|
||||||
|
fig.suptitle('平稳性检验结果', fontsize=16)
|
||||||
|
|
||||||
|
# 绘制时间序列
|
||||||
|
for i, col in enumerate(numeric_cols[:2]):
|
||||||
|
axes[0, i].plot(self.data.index, self.data[col])
|
||||||
|
axes[0, i].set_title(f'{col} - 时间序列')
|
||||||
|
axes[0, i].tick_params(axis='x', rotation=45)
|
||||||
|
axes[0, i].grid(True, alpha=0.3)
|
||||||
|
|
||||||
|
# 绘制ADF检验结果
|
||||||
|
test_stats = [results[col]['ADF']['statistic'] for col in list(results.keys())[:2]]
|
||||||
|
p_values = [results[col]['ADF']['p_value'] for col in list(results.keys())[:2]]
|
||||||
|
|
||||||
|
x_pos = np.arange(len(test_stats))
|
||||||
|
axes[1, 0].bar(x_pos - 0.2, test_stats, 0.4, label='检验统计量', alpha=0.7)
|
||||||
|
axes[1, 0].bar(x_pos + 0.2, p_values, 0.4, label='p值', alpha=0.7)
|
||||||
|
axes[1, 0].set_title('ADF检验结果')
|
||||||
|
axes[1, 0].set_xticks(x_pos)
|
||||||
|
axes[1, 0].set_xticklabels(list(results.keys())[:2])
|
||||||
|
axes[1, 0].legend()
|
||||||
|
axes[1, 0].axhline(y=0.05, color='r', linestyle='--', label='显著性水平 (0.05)')
|
||||||
|
|
||||||
|
# 绘制结论
|
||||||
|
stationary_status = [
|
||||||
|
'平稳' if results[col]['ADF']['stationary'] else '非平稳' for col in list(results.keys())[:2]
|
||||||
|
]
|
||||||
|
colors = ['green' if status == '平稳' else 'red' for status in stationary_status]
|
||||||
|
axes[1, 1].bar(x_pos, [1] * len(stationary_status), color=colors, alpha=0.7)
|
||||||
|
axes[1, 1].set_title('平稳性结论')
|
||||||
|
axes[1, 1].set_xticks(x_pos)
|
||||||
|
axes[1, 1].set_xticklabels(list(results.keys())[:2])
|
||||||
|
for i, status in enumerate(stationary_status):
|
||||||
|
axes[1, 1].text(i, 0.5, status, ha='center', va='center', fontweight='bold')
|
||||||
|
|
||||||
|
plt.tight_layout()
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'stationarity_tests.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("Stationarity tests completed", "success")
|
||||||
|
|
||||||
|
return img_path, summary, results
|
||||||
|
|
||||||
|
self._log_step("No data available for stationarity tests", "warning")
|
||||||
|
return None, "数据不足,无法进行平稳性检验", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Stationarity tests failed: {e}", "error")
|
||||||
|
return None, f"平稳性检验失败: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_cointegration_test(self):
|
||||||
|
"""执行协整检验"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing cointegration test...")
|
||||||
|
|
||||||
|
if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
|
||||||
|
self._log_step("Not enough data for cointegration test", "warning")
|
||||||
|
return None, "数据不足,无法进行协整检验", None
|
||||||
|
|
||||||
|
from statsmodels.tsa.vector_ar.vecm import coint_johansen
|
||||||
|
|
||||||
|
numeric_data = self.data.select_dtypes(include=[np.number])
|
||||||
|
if len(numeric_data.columns) < 2:
|
||||||
|
self._log_step("Not enough numeric columns for cointegration test", "warning")
|
||||||
|
return None, "数值变量不足,无法进行协整检验", None
|
||||||
|
|
||||||
|
result = coint_johansen(numeric_data, det_order=0, k_ar_diff=1)
|
||||||
|
|
||||||
|
summary = (
|
||||||
|
f"协整检验完成,轨迹统计量: {result.trace_stat[0]:.3f}, "
|
||||||
|
f"临界值(95%): {result.trace_stat_crit_vals[0, 1]:.3f}"
|
||||||
|
)
|
||||||
|
|
||||||
|
coint_data = {
|
||||||
|
'trace_stat': result.trace_stat.tolist(),
|
||||||
|
'trace_stat_crit_vals': result.trace_stat_crit_vals.tolist(),
|
||||||
|
'eigen_vals': result.eig.tolist(),
|
||||||
|
}
|
||||||
|
|
||||||
|
if not self.generate_plots:
|
||||||
|
self._log_step("Cointegration test completed (data only)", "success")
|
||||||
|
return None, summary, coint_data
|
||||||
|
|
||||||
|
plt.figure(figsize=(10, 6))
|
||||||
|
positions = np.arange(len(result.trace_stat))
|
||||||
|
plt.bar(positions - 0.2, result.trace_stat, width=0.4, label='Trace Statistic', alpha=0.7)
|
||||||
|
plt.bar(
|
||||||
|
positions + 0.2,
|
||||||
|
result.trace_stat_crit_vals[:, 1],
|
||||||
|
width=0.4,
|
||||||
|
label='Critical Value (95%)',
|
||||||
|
alpha=0.7,
|
||||||
|
)
|
||||||
|
|
||||||
|
plt.xlabel('Number of Cointegrating Relations')
|
||||||
|
plt.ylabel('Test Statistic')
|
||||||
|
plt.title('Johansen Cointegration Test Results')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True, alpha=0.3)
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
img_path = os.path.join(self.temp_dir.name, 'cointegration_test.png')
|
||||||
|
plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
self._log_step("Cointegration test completed", "success")
|
||||||
|
return img_path, summary, coint_data
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Cointegration test failed: {e}", "error")
|
||||||
|
return None, f"协整检验失败: {e}", None
|
||||||
242
app/services/analysis/modules/time_series.py
Normal file
242
app/services/analysis/modules/time_series.py
Normal file
@ -0,0 +1,242 @@
|
|||||||
|
import gc
|
||||||
|
import os
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from statsmodels.tsa.stattools import acf, pacf
|
||||||
|
from statsmodels.tsa.seasonal import seasonal_decompose
|
||||||
|
from scipy.signal import spectrogram, periodogram
|
||||||
|
|
||||||
|
|
||||||
|
def generate_time_series_plots(self):
|
||||||
|
"""生成时间序列图"""
|
||||||
|
try:
|
||||||
|
self._log_step("Generating time series plots...")
|
||||||
|
|
||||||
|
if not hasattr(self, 'data') or self.data is None or len(self.data.columns) == 0:
|
||||||
|
self._log_step("No data available for time series plots", "warning")
|
||||||
|
return None, "No data available", None
|
||||||
|
|
||||||
|
# 准备数据
|
||||||
|
n_plots = min(4, len(self.data.columns))
|
||||||
|
plot_data = self.data.iloc[:, :n_plots].reset_index()
|
||||||
|
# 将 timestamp 转为字符串,确保JSON可序列化
|
||||||
|
if 'timestamp' in plot_data.columns:
|
||||||
|
plot_data['timestamp'] = plot_data['timestamp'].astype(str)
|
||||||
|
summary = f"Generated {n_plots} time series charts"
|
||||||
|
|
||||||
|
# charts 模式:仅返回数据,不生成图片;保留绘图版在下方注释
|
||||||
|
self._log_step("Time series data prepared", "success")
|
||||||
|
return None, summary, plot_data
|
||||||
|
|
||||||
|
# --- 绘图版保留参考 ---
|
||||||
|
# fig, axes = plt.subplots(2, 2, figsize=(10, 8), dpi=100)
|
||||||
|
# fig.suptitle('Time Series Analysis', fontsize=14)
|
||||||
|
# axes = axes.flatten()
|
||||||
|
# for i in range(n_plots):
|
||||||
|
# try:
|
||||||
|
# col = self.data.columns[i]
|
||||||
|
# axes[i].plot(self.data.index, self.data[col], linewidth=1)
|
||||||
|
# axes[i].set_title(f'{col}')
|
||||||
|
# axes[i].tick_params(axis='x', rotation=45)
|
||||||
|
# axes[i].grid(True, alpha=0.3)
|
||||||
|
# except Exception as plot_err:
|
||||||
|
# self._log_step(f"Plot {col} error: {plot_err}", "warning")
|
||||||
|
# for i in range(n_plots, len(axes)):
|
||||||
|
# fig.delaxes(axes[i])
|
||||||
|
# plt.tight_layout()
|
||||||
|
# img_path = os.path.join(self.temp_dir.name, 'time_series.png')
|
||||||
|
# plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
|
||||||
|
# plt.close(fig)
|
||||||
|
# self._log_step("Time series plots generated", "success")
|
||||||
|
# return img_path, summary, plot_data
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Time series plots failed: {str(e)[:100]}", "error")
|
||||||
|
return None, f"Error: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def generate_acf_pacf_plots(self):
|
||||||
|
"""生成自相关和偏自相关图"""
|
||||||
|
try:
|
||||||
|
self._log_step("Generating ACF and PACF plots...")
|
||||||
|
|
||||||
|
if hasattr(self, 'data') and self.data is not None:
|
||||||
|
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
|
||||||
|
n_cols = min(3, len(numeric_cols))
|
||||||
|
|
||||||
|
# 计算ACF和PACF数据
|
||||||
|
acf_pacf_results = {}
|
||||||
|
for col in numeric_cols[:n_cols]:
|
||||||
|
series = self.data[col].dropna()
|
||||||
|
try:
|
||||||
|
acf_vals = np.asarray(acf(series, nlags=min(40, len(series) // 4)))
|
||||||
|
pacf_vals = np.asarray(pacf(series, nlags=min(20, len(series) // 5)))
|
||||||
|
acf_pacf_results[col] = {
|
||||||
|
'acf': acf_vals.tolist(),
|
||||||
|
'pacf': pacf_vals.tolist(),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Error calculating ACF/PACF for {col}: {e}", "warning")
|
||||||
|
|
||||||
|
summary = f"生成 {n_cols} 个变量的ACF和PACF数据"
|
||||||
|
self._log_step("ACF and PACF data generated", "success")
|
||||||
|
return None, summary, acf_pacf_results
|
||||||
|
|
||||||
|
# --- 绘图版保留参考 ---
|
||||||
|
# fig, axes = plt.subplots(n_cols, 2, figsize=(12, 4 * n_cols))
|
||||||
|
# fig.suptitle('自相关和偏自相关分析', fontsize=16)
|
||||||
|
# if n_cols == 1:
|
||||||
|
# axes = axes.reshape(1, -1)
|
||||||
|
# for i, col in enumerate(numeric_cols[:n_cols]):
|
||||||
|
# series = self.data[col].dropna()
|
||||||
|
# plot_acf(series, ax=axes[i, 0], lags=min(40, len(series) // 4))
|
||||||
|
# axes[i, 0].set_title(f'{col} - 自相关函数 (ACF)')
|
||||||
|
# plot_pacf(series, ax=axes[i, 1], lags=min(20, len(series) // 5))
|
||||||
|
# axes[i, 1].set_title(f'{col} - 偏自相关函数 (PACF)')
|
||||||
|
# plt.tight_layout()
|
||||||
|
# img_path = os.path.join(self.temp_dir.name, 'acf_pacf_plots.png')
|
||||||
|
# plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
# plt.close()
|
||||||
|
# self._log_step("ACF and PACF plots generated", "success")
|
||||||
|
# return img_path, f"生成 {n_cols} 个变量的ACF和PACF图", acf_pacf_results
|
||||||
|
|
||||||
|
self._log_step("No data available for ACF/PACF plots", "warning")
|
||||||
|
return None, "数据不足,无法生成ACF/PACF图", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"ACF/PACF plots failed: {e}", "error")
|
||||||
|
return None, f"ACF/PACF图生成失败: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_seasonal_decomposition(self):
|
||||||
|
"""执行季节性分解"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing seasonal decomposition...")
|
||||||
|
|
||||||
|
if hasattr(self, 'data') and self.data is not None:
|
||||||
|
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
|
||||||
|
|
||||||
|
# 选择第一个数值列进行分解
|
||||||
|
if len(numeric_cols) > 0:
|
||||||
|
col = numeric_cols[0]
|
||||||
|
series = self.data[col].dropna()
|
||||||
|
|
||||||
|
# 季节性分解
|
||||||
|
result = seasonal_decompose(series, model='additive', period=min(24, len(series) // 2))
|
||||||
|
|
||||||
|
decomposition_data = pd.DataFrame({
|
||||||
|
'observed': result.observed,
|
||||||
|
'trend': result.trend,
|
||||||
|
'seasonal': result.seasonal,
|
||||||
|
'resid': result.resid,
|
||||||
|
})
|
||||||
|
# 填充NaN以确保JSON序列化
|
||||||
|
decomposition_data = decomposition_data.astype(object).where(
|
||||||
|
pd.notnull(decomposition_data),
|
||||||
|
None, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
summary = f"季节性分解完成,变量: {col}"
|
||||||
|
|
||||||
|
self._log_step("Seasonal decomposition completed (data only)", "success")
|
||||||
|
return None, summary, decomposition_data
|
||||||
|
|
||||||
|
# --- 绘图版保留参考 ---
|
||||||
|
# fig, axes = plt.subplots(4, 1, figsize=(12, 10))
|
||||||
|
# fig.suptitle(f'{col} - 季节性分解', fontsize=16)
|
||||||
|
# result.observed.plot(ax=axes[0], title='原始序列')
|
||||||
|
# result.trend.plot(ax=axes[1], title='趋势成分')
|
||||||
|
# result.seasonal.plot(ax=axes[2], title='季节成分')
|
||||||
|
# result.resid.plot(ax=axes[3], title='残差成分')
|
||||||
|
# for ax in axes:
|
||||||
|
# ax.tick_params(axis='x', rotation=45)
|
||||||
|
# ax.grid(True, alpha=0.3)
|
||||||
|
# plt.tight_layout()
|
||||||
|
# img_path = os.path.join(self.temp_dir.name, 'seasonal_decomposition.png')
|
||||||
|
# plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
# plt.close()
|
||||||
|
# self._log_step("Seasonal decomposition completed", "success")
|
||||||
|
# return img_path, summary, decomposition_data
|
||||||
|
|
||||||
|
self._log_step("No numeric columns for decomposition", "warning")
|
||||||
|
return None, "没有数值列可用于季节性分解", None
|
||||||
|
|
||||||
|
self._log_step("No data available for seasonal decomposition", "warning")
|
||||||
|
return None, "数据不足,无法进行季节性分解", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Seasonal decomposition failed: {e}", "error")
|
||||||
|
return None, f"季节性分解失败: {e}", None
|
||||||
|
|
||||||
|
|
||||||
|
def perform_spectral_analysis(self):
|
||||||
|
"""执行频谱分析"""
|
||||||
|
try:
|
||||||
|
self._log_step("Performing spectral analysis...")
|
||||||
|
|
||||||
|
if hasattr(self, 'data') and self.data is not None:
|
||||||
|
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
|
||||||
|
|
||||||
|
# 计算频谱数据(简化输出,避免数据量过大)
|
||||||
|
spectral_results = {}
|
||||||
|
for col in numeric_cols[:2]:
|
||||||
|
try:
|
||||||
|
series = self.data[col].dropna().values
|
||||||
|
f, t, Sxx = spectrogram(series, fs=1.0, nperseg=min(256, len(series) // 4))
|
||||||
|
f_p, Pxx_den = periodogram(series, fs=1.0)
|
||||||
|
|
||||||
|
# 仅保留频谱的均值和形状,避免返回完整矩阵
|
||||||
|
Sxx_log = 10 * np.log10(Sxx + 1e-12)
|
||||||
|
|
||||||
|
spectral_results[col] = {
|
||||||
|
'spectrogram': {
|
||||||
|
'f': f.tolist(),
|
||||||
|
't': t.tolist(),
|
||||||
|
'Sxx_log10_mean': float(np.mean(Sxx_log)),
|
||||||
|
'Sxx_shape': Sxx.shape,
|
||||||
|
},
|
||||||
|
'periodogram': {
|
||||||
|
'f': f_p.tolist()[:20],
|
||||||
|
'Pxx_den': Pxx_den.tolist()[:20],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Spectral calc failed for {col}: {e}", "warning")
|
||||||
|
|
||||||
|
summary = "Spectral analysis completed"
|
||||||
|
self._log_step("Spectral analysis completed (data only)", "success")
|
||||||
|
return None, summary, spectral_results
|
||||||
|
|
||||||
|
# --- 绘图版保留参考 ---
|
||||||
|
# n_cols = min(2, len(numeric_cols))
|
||||||
|
# fig, axes = plt.subplots(n_cols, 2, figsize=(15, 5 * n_cols))
|
||||||
|
# fig.suptitle('频谱分析', fontsize=16)
|
||||||
|
# if n_cols == 1:
|
||||||
|
# axes = axes.reshape(1, -1)
|
||||||
|
# for i, col in enumerate(numeric_cols[:n_cols]):
|
||||||
|
# series = self.data[col].dropna().values
|
||||||
|
# f, t, Sxx = spectrogram(series, fs=1.0, nperseg=min(256, len(series) // 4))
|
||||||
|
# axes[i, 0].pcolormesh(t, f, 10 * np.log10(Sxx), shading='gouraud')
|
||||||
|
# axes[i, 0].set_title(f'{col} - 频谱图')
|
||||||
|
# axes[i, 0].set_ylabel('频率 [Hz]')
|
||||||
|
# axes[i, 0].set_xlabel('时间')
|
||||||
|
# f, Pxx_den = periodogram(series, fs=1.0)
|
||||||
|
# axes[i, 1].semilogy(f, Pxx_den)
|
||||||
|
# axes[i, 1].set_title(f'{col} - 周期图')
|
||||||
|
# axes[i, 1].set_xlabel('频率 [Hz]')
|
||||||
|
# axes[i, 1].set_ylabel('PSD [V**2/Hz]')
|
||||||
|
# axes[i, 1].grid(True, alpha=0.3)
|
||||||
|
# plt.tight_layout()
|
||||||
|
# img_path = os.path.join(self.temp_dir.name, 'spectral_analysis.png')
|
||||||
|
# plt.savefig(img_path, dpi=150, bbox_inches='tight')
|
||||||
|
# plt.close()
|
||||||
|
# self._log_step("Spectral analysis completed", "success")
|
||||||
|
# return img_path, f"频谱分析完成,分析了 {n_cols} 个变量", spectral_results
|
||||||
|
|
||||||
|
self._log_step("No data available for spectral analysis", "warning")
|
||||||
|
return None, "数据不足,无法进行频谱分析", None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._log_step(f"Spectral analysis failed: {e}", "error")
|
||||||
|
return None, f"频谱分析失败: {e}", None
|
||||||
1062
app/services/analysis_system.py
Normal file
1062
app/services/analysis_system.py
Normal file
File diff suppressed because it is too large
Load Diff
249
app/services/font_manager.py
Normal file
249
app/services/font_manager.py
Normal file
@ -0,0 +1,249 @@
|
|||||||
|
"""
|
||||||
|
字体管理模块 - 支持跨平台字体检测和配置
|
||||||
|
支持 Linux、macOS、Windows 三个平台
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional, List, Dict
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import matplotlib.font_manager as fm
|
||||||
|
from app.core.config import settings
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class FontManager:
|
||||||
|
"""字体管理器 - 处理跨平台字体检测和配置"""
|
||||||
|
|
||||||
|
# 支持的字体路径映射(按优先级排序)
|
||||||
|
FONT_PATHS = {
|
||||||
|
'zh': { # 中文字体
|
||||||
|
'linux': [
|
||||||
|
'/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc',
|
||||||
|
'/usr/share/fonts/truetype/wqy/wqy-microhei.ttc',
|
||||||
|
'/usr/share/fonts/truetype/liberation/LiberationSerif-Regular.ttf',
|
||||||
|
'/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf',
|
||||||
|
],
|
||||||
|
'darwin': [ # macOS
|
||||||
|
'/Library/Fonts/SimHei.ttf',
|
||||||
|
'/System/Library/Fonts/STHeiti Light.ttc',
|
||||||
|
'/Applications/Microsoft Office/Library/Fonts/SimSun.ttf',
|
||||||
|
'/Library/Fonts/Arial.ttf',
|
||||||
|
],
|
||||||
|
'win32': [
|
||||||
|
'C:\\Windows\\Fonts\\simhei.ttf',
|
||||||
|
'C:\\Windows\\Fonts\\simsun.ttc',
|
||||||
|
'C:\\Windows\\Fonts\\msyh.ttc',
|
||||||
|
'C:\\Windows\\Fonts\\arial.ttf',
|
||||||
|
]
|
||||||
|
},
|
||||||
|
'en': { # 英文字体
|
||||||
|
'linux': [
|
||||||
|
'/usr/share/fonts/truetype/liberation/LiberationSerif-Regular.ttf',
|
||||||
|
'/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf',
|
||||||
|
'/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf',
|
||||||
|
],
|
||||||
|
'darwin': [
|
||||||
|
'/Library/Fonts/Times New Roman.ttf',
|
||||||
|
'/Library/Fonts/Arial.ttf',
|
||||||
|
'/System/Library/Fonts/Helvetica.ttc',
|
||||||
|
],
|
||||||
|
'win32': [
|
||||||
|
'C:\\Windows\\Fonts\\times.ttf',
|
||||||
|
'C:\\Windows\\Fonts\\arial.ttf',
|
||||||
|
'C:\\Windows\\Fonts\\georgia.ttf',
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# 项目内置字体
|
||||||
|
PROJECT_FONTS = {
|
||||||
|
'zh_regular': 'SubsetOTF/CN/SourceHanSansCN-Regular.otf',
|
||||||
|
'zh_bold': 'SubsetOTF/CN/SourceHanSansCN-Bold.otf',
|
||||||
|
'en_regular': None, # 英文使用系统字体
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, fonts_dir: Optional[Path] = None):
|
||||||
|
"""
|
||||||
|
初始化字体管理器
|
||||||
|
|
||||||
|
Args:
|
||||||
|
fonts_dir: 项目字体目录路径
|
||||||
|
"""
|
||||||
|
self.fonts_dir = fonts_dir or settings.FONTS_DIR
|
||||||
|
self.platform = sys.platform
|
||||||
|
self.available_fonts = {}
|
||||||
|
self._init_fonts()
|
||||||
|
|
||||||
|
def _init_fonts(self):
|
||||||
|
"""初始化字体系统"""
|
||||||
|
logger.info(f"初始化字体系统 (平台: {self.platform})")
|
||||||
|
|
||||||
|
# 扫描系统和项目字体
|
||||||
|
self._scan_system_fonts()
|
||||||
|
self._register_project_fonts()
|
||||||
|
|
||||||
|
def _scan_system_fonts(self):
|
||||||
|
"""扫描系统可用字体"""
|
||||||
|
logger.info("扫描系统字体...")
|
||||||
|
|
||||||
|
for lang, fonts in self.FONT_PATHS.items():
|
||||||
|
paths = fonts.get(self.platform, [])
|
||||||
|
for font_path in paths:
|
||||||
|
if os.path.exists(font_path):
|
||||||
|
self.available_fonts[lang] = font_path
|
||||||
|
logger.info(f"找到{lang}字体: {font_path}")
|
||||||
|
break
|
||||||
|
|
||||||
|
if lang not in self.available_fonts:
|
||||||
|
logger.warning(f"未找到系统{lang}字体")
|
||||||
|
|
||||||
|
def _register_project_fonts(self):
|
||||||
|
"""注册项目内置字体"""
|
||||||
|
logger.info(f"扫描项目字体目录: {self.fonts_dir}")
|
||||||
|
|
||||||
|
# 注册中文字体
|
||||||
|
zh_font_path = self.fonts_dir / self.PROJECT_FONTS['zh_regular']
|
||||||
|
if zh_font_path.exists():
|
||||||
|
try:
|
||||||
|
self.available_fonts['zh'] = str(zh_font_path)
|
||||||
|
logger.info(f"注册项目中文字体: {zh_font_path}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"注册项目中文字体失败: {e}")
|
||||||
|
|
||||||
|
def get_font(self, language: str = 'zh') -> str:
|
||||||
|
"""
|
||||||
|
获取可用的字体路径
|
||||||
|
|
||||||
|
Args:
|
||||||
|
language: 语言类型 ('zh' 或 'en')
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
字体文件路径
|
||||||
|
"""
|
||||||
|
if language in self.available_fonts:
|
||||||
|
return self.available_fonts[language]
|
||||||
|
|
||||||
|
logger.warning(f"未找到{language}字体,使用默认字体")
|
||||||
|
return 'DejaVuSans' if language == 'en' else 'Arial'
|
||||||
|
|
||||||
|
def setup_matplotlib_font(self, language: str = 'zh'):
|
||||||
|
"""
|
||||||
|
配置 Matplotlib 使用的字体
|
||||||
|
|
||||||
|
Args:
|
||||||
|
language: 语言类型 ('zh' 或 'en')
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
font_path = self.get_font(language)
|
||||||
|
|
||||||
|
if os.path.isfile(font_path):
|
||||||
|
# 注册字体文件到 Matplotlib
|
||||||
|
fm.fontManager.addfont(font_path)
|
||||||
|
|
||||||
|
# 从文件路径加载字体
|
||||||
|
prop = fm.FontProperties(fname=font_path)
|
||||||
|
plt.rcParams['font.sans-serif'] = [prop.get_name()]
|
||||||
|
# 解决负号显示问题
|
||||||
|
plt.rcParams['axes.unicode_minus'] = False
|
||||||
|
logger.info(f"Matplotlib 字体配置为: {font_path}")
|
||||||
|
else:
|
||||||
|
# 使用字体名称
|
||||||
|
plt.rcParams['font.sans-serif'] = [font_path]
|
||||||
|
plt.rcParams['axes.unicode_minus'] = False
|
||||||
|
logger.info(f"Matplotlib 字体配置为: {font_path}")
|
||||||
|
|
||||||
|
plt.rcParams['axes.unicode_minus'] = False # 解决负号显示问题
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"配置 Matplotlib 字体失败: {e}")
|
||||||
|
|
||||||
|
def get_font_installation_command(self) -> str:
|
||||||
|
"""
|
||||||
|
获取当前系统推荐的字体安装命令
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
安装命令字符串
|
||||||
|
"""
|
||||||
|
if self.platform == 'linux':
|
||||||
|
return "apt-get install fonts-wqy-microhei fonts-noto-cjk-extra -y"
|
||||||
|
elif self.platform == 'darwin':
|
||||||
|
return "brew install --cask font-noto-sans-cjk"
|
||||||
|
else:
|
||||||
|
return "请从 https://www.noto-fonts.cn 下载并安装 Noto Sans CJK 字体"
|
||||||
|
|
||||||
|
def suggest_font_installation(self) -> bool:
|
||||||
|
"""
|
||||||
|
检查并建议安装字体
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
是否建议安装字体
|
||||||
|
"""
|
||||||
|
if 'zh' not in self.available_fonts:
|
||||||
|
logger.warning("=" * 60)
|
||||||
|
logger.warning("⚠️ 警告: 未找到中文字体!")
|
||||||
|
logger.warning("推荐的安装命令:")
|
||||||
|
logger.warning(self.get_font_installation_command())
|
||||||
|
logger.warning("=" * 60)
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def check_font_available(font_name: str) -> bool:
|
||||||
|
"""
|
||||||
|
检查指定字体是否可用
|
||||||
|
|
||||||
|
Args:
|
||||||
|
font_name: 字体名称
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
字体是否可用
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
fm.findfont(fm.FontProperties(family=font_name))
|
||||||
|
return True
|
||||||
|
except:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
# 全局字体管理器实例
|
||||||
|
_font_manager: Optional[FontManager] = None
|
||||||
|
|
||||||
|
|
||||||
|
def get_font_manager(fonts_dir: Optional[Path] = None) -> FontManager:
|
||||||
|
"""获取全局字体管理器实例"""
|
||||||
|
global _font_manager
|
||||||
|
if _font_manager is None:
|
||||||
|
_font_manager = FontManager(fonts_dir)
|
||||||
|
return _font_manager
|
||||||
|
|
||||||
|
|
||||||
|
def setup_fonts_for_app(languages: List[str] = ['zh', 'en']) -> Dict[str, str]:
|
||||||
|
"""
|
||||||
|
为应用设置字体 (一次性初始化)
|
||||||
|
|
||||||
|
Args:
|
||||||
|
languages: 需要支持的语言列表
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
字体配置字典
|
||||||
|
"""
|
||||||
|
font_manager = get_font_manager()
|
||||||
|
|
||||||
|
# 提示用户安装字体(如需要)
|
||||||
|
font_manager.suggest_font_installation()
|
||||||
|
|
||||||
|
# 为每个语言配置 Matplotlib
|
||||||
|
fonts_config = {}
|
||||||
|
for lang in languages:
|
||||||
|
try:
|
||||||
|
# 配置 Matplotlib
|
||||||
|
font_manager.setup_matplotlib_font(lang)
|
||||||
|
|
||||||
|
logger.info(f"✓ {lang} 语言字体配置完成")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"配置 {lang} 语言字体失败: {e}")
|
||||||
|
|
||||||
|
return fonts_config
|
||||||
210
app/services/linux_adapter.py
Normal file
210
app/services/linux_adapter.py
Normal file
@ -0,0 +1,210 @@
|
|||||||
|
"""
|
||||||
|
Linux 系统适配模块
|
||||||
|
处理 Linux 特有的路径、权限、环境变量等问题
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class LinuxAdapter:
|
||||||
|
"""Linux 系统适配器"""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def is_linux() -> bool:
|
||||||
|
"""检查是否运行在 Linux 系统上"""
|
||||||
|
return sys.platform.startswith('linux')
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def normalize_path(path: str) -> Path:
|
||||||
|
"""
|
||||||
|
规范化路径 - 适配不同操作系统
|
||||||
|
|
||||||
|
Args:
|
||||||
|
path: 路径字符串(可能混合了不同分隔符)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
规范化后的 Path 对象
|
||||||
|
"""
|
||||||
|
# 替换反斜杠为正斜杠
|
||||||
|
path = path.replace('\\', '/')
|
||||||
|
# 创建 Path 对象,会根据系统自动转换
|
||||||
|
return Path(path).resolve()
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def ensure_directory_writable(dir_path: Path) -> bool:
|
||||||
|
"""
|
||||||
|
确保目录可写
|
||||||
|
|
||||||
|
Args:
|
||||||
|
dir_path: 目录路径
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
是否成功
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
dir_path = Path(dir_path)
|
||||||
|
dir_path.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# 检查写入权限
|
||||||
|
test_file = dir_path / '.test_write'
|
||||||
|
test_file.touch()
|
||||||
|
test_file.unlink()
|
||||||
|
|
||||||
|
logger.info(f"✓ 目录可写: {dir_path}")
|
||||||
|
return True
|
||||||
|
except PermissionError:
|
||||||
|
logger.error(f"✗ 没有写入权限: {dir_path}")
|
||||||
|
logger.error(f" 建议: sudo chmod 755 {dir_path}")
|
||||||
|
return False
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"✗ 目录检查失败: {dir_path} - {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_recommended_upload_dir() -> Path:
|
||||||
|
"""
|
||||||
|
获取 Linux 上推荐的上传目录
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
推荐的上传目录路径
|
||||||
|
"""
|
||||||
|
# 优先级:
|
||||||
|
# 1. 环境变量指定的目录
|
||||||
|
# 2. 项目相对路径
|
||||||
|
# 3. /tmp (临时目录)
|
||||||
|
|
||||||
|
if upload_dir := os.getenv('UPLOAD_DIR'):
|
||||||
|
return Path(upload_dir)
|
||||||
|
|
||||||
|
project_upload = Path(__file__).parent.parent.parent / 'uploads'
|
||||||
|
if project_upload.exists() and os.access(project_upload, os.W_OK):
|
||||||
|
return project_upload
|
||||||
|
|
||||||
|
logger.warning("使用系统临时目录进行上传存储")
|
||||||
|
return Path('/tmp/lazy_fjh_uploads')
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def setup_signal_handlers():
|
||||||
|
"""
|
||||||
|
设置 Linux 信号处理器
|
||||||
|
确保优雅关闭
|
||||||
|
"""
|
||||||
|
import signal
|
||||||
|
|
||||||
|
def signal_handler(sig, frame):
|
||||||
|
logger.info(f"收到信号 {sig},开始优雅关闭...")
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
if LinuxAdapter.is_linux():
|
||||||
|
signal.signal(signal.SIGTERM, signal_handler)
|
||||||
|
signal.signal(signal.SIGINT, signal_handler)
|
||||||
|
logger.info("✓ Linux 信号处理器已注册")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def get_process_info() -> dict:
|
||||||
|
"""
|
||||||
|
获取当前进程信息
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
进程信息字典
|
||||||
|
"""
|
||||||
|
import psutil
|
||||||
|
|
||||||
|
process = psutil.Process(os.getpid())
|
||||||
|
|
||||||
|
return {
|
||||||
|
'pid': os.getpid(),
|
||||||
|
'user': os.getlogin() if LinuxAdapter.is_linux() else 'unknown',
|
||||||
|
'memory_mb': process.memory_info().rss / 1024 / 1024,
|
||||||
|
'cpu_percent': process.cpu_percent(interval=1),
|
||||||
|
'num_threads': process.num_threads()
|
||||||
|
}
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def check_system_resources() -> dict:
|
||||||
|
"""
|
||||||
|
检查系统资源
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
系统资源信息
|
||||||
|
"""
|
||||||
|
import psutil
|
||||||
|
|
||||||
|
return {
|
||||||
|
'cpu_count': psutil.cpu_count(),
|
||||||
|
'total_memory_gb': psutil.virtual_memory().total / (1024**3),
|
||||||
|
'available_memory_gb': psutil.virtual_memory().available / (1024**3),
|
||||||
|
'disk_usage_percent': psutil.disk_usage('/').percent
|
||||||
|
}
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def optimize_for_linux():
|
||||||
|
"""
|
||||||
|
针对 Linux 系统进行优化
|
||||||
|
"""
|
||||||
|
if not LinuxAdapter.is_linux():
|
||||||
|
return
|
||||||
|
|
||||||
|
logger.info("应用 Linux 系统优化...")
|
||||||
|
|
||||||
|
# 1. 增加文件描述符限制
|
||||||
|
try:
|
||||||
|
import resource
|
||||||
|
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
|
||||||
|
resource.setrlimit(resource.RLIMIT_NOFILE, (hard, hard))
|
||||||
|
logger.info(f"✓ 设置文件描述符限制: {hard}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"⚠ 无法设置文件描述符: {e}")
|
||||||
|
|
||||||
|
# 2. 可选内存限制(默认跳过,避免在 WSL/容器中因 RLIMIT_AS 过低触发 MemoryError)
|
||||||
|
mem_limit_env = os.getenv('LINUX_RLIMIT_AS_MB') or os.getenv('LINUX_MEMORY_LIMIT_MB')
|
||||||
|
if mem_limit_env:
|
||||||
|
try:
|
||||||
|
import resource
|
||||||
|
limit_mb = int(mem_limit_env)
|
||||||
|
limit_bytes = limit_mb * 1024**2
|
||||||
|
resource.setrlimit(resource.RLIMIT_AS, (limit_bytes, limit_bytes))
|
||||||
|
logger.info(f"✓ 设置虚拟内存限制: {limit_mb}MB")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"⚠ 无法设置虚拟内存限制: {e}")
|
||||||
|
else:
|
||||||
|
logger.info("跳过虚拟内存限制 (未设置 LINUX_RLIMIT_AS_MB / LINUX_MEMORY_LIMIT_MB)")
|
||||||
|
|
||||||
|
# 3. 注册信号处理
|
||||||
|
LinuxAdapter.setup_signal_handlers()
|
||||||
|
|
||||||
|
logger.info("Linux 系统优化完成")
|
||||||
|
|
||||||
|
|
||||||
|
def init_linux_environment():
|
||||||
|
"""
|
||||||
|
初始化 Linux 环境
|
||||||
|
在应用启动时调用
|
||||||
|
"""
|
||||||
|
if not LinuxAdapter.is_linux():
|
||||||
|
logger.info("非 Linux 系统,跳过 Linux 特定初始化")
|
||||||
|
return
|
||||||
|
|
||||||
|
logger.info("=" * 60)
|
||||||
|
logger.info("初始化 Linux 环境...")
|
||||||
|
logger.info("=" * 60)
|
||||||
|
|
||||||
|
# 应用优化
|
||||||
|
LinuxAdapter.optimize_for_linux()
|
||||||
|
|
||||||
|
# 检查系统资源
|
||||||
|
resources = LinuxAdapter.check_system_resources()
|
||||||
|
logger.info(f"系统资源: {resources}")
|
||||||
|
|
||||||
|
# 检查上传目录
|
||||||
|
upload_dir = LinuxAdapter.get_recommended_upload_dir()
|
||||||
|
if not LinuxAdapter.ensure_directory_writable(upload_dir):
|
||||||
|
logger.warning(f"上传目录 {upload_dir} 可能不可写")
|
||||||
|
|
||||||
|
logger.info("Linux 环境初始化完成")
|
||||||
159
app/services/oss_csv_source.py
Normal file
159
app/services/oss_csv_source.py
Normal file
@ -0,0 +1,159 @@
|
|||||||
|
"""OSS/URL CSV source (v2).
|
||||||
|
|
||||||
|
- Validates incoming URL to reduce SSRF risk (allowlist + IP checks)
|
||||||
|
- Downloads CSV to a local temporary file for analysis
|
||||||
|
|
||||||
|
This module is intentionally small and dependency-light.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import ipaddress
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import socket
|
||||||
|
import tempfile
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Optional
|
||||||
|
from urllib.parse import urlsplit
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
from app.core.config import settings
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class DownloadedCsv:
|
||||||
|
local_path: str
|
||||||
|
source_host: str
|
||||||
|
source_name: str
|
||||||
|
etag: Optional[str] = None
|
||||||
|
last_modified: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class UrlValidationError(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def _is_ip_allowed(ip_str: str) -> bool:
|
||||||
|
ip = ipaddress.ip_address(ip_str)
|
||||||
|
|
||||||
|
if settings.V2_ALLOW_PRIVATE_NETWORKS:
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Block loopback/link-local/private/multicast/unspecified/reserved
|
||||||
|
if (
|
||||||
|
ip.is_loopback
|
||||||
|
or ip.is_private
|
||||||
|
or ip.is_link_local
|
||||||
|
or ip.is_multicast
|
||||||
|
or ip.is_unspecified
|
||||||
|
or ip.is_reserved
|
||||||
|
):
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def validate_source_url(source_url: str) -> tuple[str, str]:
|
||||||
|
"""Validate URL and return (host, source_name)."""
|
||||||
|
|
||||||
|
if not source_url or not isinstance(source_url, str):
|
||||||
|
raise UrlValidationError("source_url 不能为空")
|
||||||
|
|
||||||
|
parts = urlsplit(source_url)
|
||||||
|
|
||||||
|
if parts.scheme not in {"https", "http"}:
|
||||||
|
raise UrlValidationError("仅支持 http/https URL")
|
||||||
|
|
||||||
|
if parts.scheme == "http" and not settings.V2_ALLOW_HTTP:
|
||||||
|
raise UrlValidationError("不允许 http;请使用 https 或开启 V2_ALLOW_HTTP")
|
||||||
|
|
||||||
|
if not parts.netloc:
|
||||||
|
raise UrlValidationError("URL 缺少 host")
|
||||||
|
|
||||||
|
# Disallow URLs with userinfo
|
||||||
|
if "@" in parts.netloc:
|
||||||
|
raise UrlValidationError("URL 不允许包含用户名/密码")
|
||||||
|
|
||||||
|
host = parts.hostname
|
||||||
|
if not host:
|
||||||
|
raise UrlValidationError("无法解析 URL host")
|
||||||
|
|
||||||
|
# Optional allowlist
|
||||||
|
if settings.V2_ALLOWED_HOSTS:
|
||||||
|
allowed = {h.lower() for h in settings.V2_ALLOWED_HOSTS}
|
||||||
|
if host.lower() not in allowed:
|
||||||
|
raise UrlValidationError(f"host 不在白名单: {host}")
|
||||||
|
|
||||||
|
# Resolve host -> IP and block private/loopback, unless explicitly allowed.
|
||||||
|
try:
|
||||||
|
addr_info = socket.getaddrinfo(host, None)
|
||||||
|
except socket.gaierror as e:
|
||||||
|
raise UrlValidationError(f"DNS 解析失败: {host} ({e})") from e
|
||||||
|
|
||||||
|
for family, _type, _proto, _canonname, sockaddr in addr_info:
|
||||||
|
ip_str = None
|
||||||
|
if family == socket.AF_INET:
|
||||||
|
ip_str = str(sockaddr[0])
|
||||||
|
elif family == socket.AF_INET6:
|
||||||
|
ip_str = str(sockaddr[0])
|
||||||
|
if ip_str and not _is_ip_allowed(ip_str):
|
||||||
|
raise UrlValidationError(f"host 解析到不允许的 IP: {ip_str}")
|
||||||
|
|
||||||
|
source_name = os.path.basename(parts.path) or "data.csv"
|
||||||
|
return host, source_name
|
||||||
|
|
||||||
|
|
||||||
|
def download_csv_to_tempfile(source_url: str, *, suffix: str = ".csv") -> DownloadedCsv:
|
||||||
|
"""Download URL content to a temp file and return local path + meta."""
|
||||||
|
|
||||||
|
host, source_name = validate_source_url(source_url)
|
||||||
|
|
||||||
|
# Create temp file inside configured TEMP_DIR for easier ops/observability
|
||||||
|
settings.TEMP_DIR.mkdir(exist_ok=True)
|
||||||
|
tmp = tempfile.NamedTemporaryFile(
|
||||||
|
mode="wb",
|
||||||
|
suffix=suffix,
|
||||||
|
dir=str(settings.TEMP_DIR),
|
||||||
|
delete=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
timeout = (settings.V2_CONNECT_TIMEOUT_SECONDS, settings.V2_DOWNLOAD_TIMEOUT_SECONDS)
|
||||||
|
with requests.get(source_url, stream=True, timeout=timeout) as resp:
|
||||||
|
resp.raise_for_status()
|
||||||
|
etag = resp.headers.get("ETag")
|
||||||
|
last_modified = resp.headers.get("Last-Modified")
|
||||||
|
|
||||||
|
for chunk in resp.iter_content(chunk_size=1024 * 1024):
|
||||||
|
if not chunk:
|
||||||
|
continue
|
||||||
|
tmp.write(chunk)
|
||||||
|
|
||||||
|
tmp.flush()
|
||||||
|
tmp.close()
|
||||||
|
|
||||||
|
if os.path.getsize(tmp.name) <= 0:
|
||||||
|
raise UrlValidationError("下载内容为空")
|
||||||
|
|
||||||
|
return DownloadedCsv(
|
||||||
|
local_path=tmp.name,
|
||||||
|
source_host=host,
|
||||||
|
source_name=source_name,
|
||||||
|
etag=etag,
|
||||||
|
last_modified=last_modified,
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception:
|
||||||
|
try:
|
||||||
|
tmp.close()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
os.unlink(tmp.name)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
raise
|
||||||
201
complex_test.csv
Normal file
201
complex_test.csv
Normal file
@ -0,0 +1,201 @@
|
|||||||
|
date,sales,ad_cost,temperature
|
||||||
|
2023-01-01,100.99342830602247,52.28565095475265,25.216717023616898
|
||||||
|
2023-01-02,107.81824957442487,56.71304741905361,28.151623674857277
|
||||||
|
2023-01-03,111.52881013862651,61.17966128518964,29.915228586591738
|
||||||
|
2023-01-04,108.02879336255101,59.283406941450025,29.990188012450005
|
||||||
|
2023-01-05,96.05238491579978,41.13784561811443,28.44879856043664
|
||||||
|
2023-01-06,90.99563724567052,40.80869342325965,31.617293515635463
|
||||||
|
2023-01-07,97.00157358481341,51.075963128450006,26.495631174163773
|
||||||
|
2023-01-08,103.5773501810333,54.357604845077695,29.22110275096627
|
||||||
|
2023-01-09,109.08744194176155,57.118959402411015,29.95887684488444
|
||||||
|
2023-01-10,113.00806452823355,75.76768971739038,31.091055195643584
|
||||||
|
2023-01-11,105.55591950042415,55.63241230367791,31.632332071452602
|
||||||
|
2023-01-12,97.09626710914998,54.22596175547798,26.073309905390914
|
||||||
|
2023-01-13,93.65306223164116,51.59653993328659,24.79464241241625
|
||||||
|
2023-01-14,91.96068299744252,49.237297755250246,33.179764134037235
|
||||||
|
2023-01-15,100.63489742855835,48.74110249107745,30.293424447999076
|
||||||
|
2023-01-16,110.82697548909167,59.20833384701217,27.000771546109235
|
||||||
|
2023-01-17,111.57901650425853,51.92538217944141,33.849435826065054
|
||||||
|
2023-01-18,108.60807500637348,53.1199444694867,29.492752546094657
|
||||||
|
2023-01-19,97.7225257041962,46.43444511295258,32.63336893912616
|
||||||
|
2023-01-20,92.05768224050016,46.43821181718169,29.247781574883593
|
||||||
|
2023-01-21,100.66865680262455,61.90762123467982,35.17721864901782
|
||||||
|
2023-01-22,105.67497634353768,43.5011622088101,34.21074614542007
|
||||||
|
2023-01-23,114.00518017281185,60.43389103827849,28.14757991637182
|
||||||
|
2023-01-24,112.42920163028982,48.151021459196656,31.758933958390703
|
||||||
|
2023-01-25,108.38465543599729,51.83266838905148,30.730097698001675
|
||||||
|
2023-01-26,101.27527814467474,56.082392057174204,32.84961326556187
|
||||||
|
2023-01-27,94.30141660796275,47.47210839945869,25.798696954943104
|
||||||
|
2023-01-28,100.44473732127179,44.833644770989366,30.701370460397328
|
||||||
|
2023-01-29,106.96636309508162,49.90666300124097,31.768238284669366
|
||||||
|
2023-01-30,115.19859424833119,60.99728586883897,23.266495108569856
|
||||||
|
2023-01-31,115.74165722884301,54.218995455835824,24.94268677356046
|
||||||
|
2023-02-01,114.6690461592577,58.41681602753873,22.32451452199608
|
||||||
|
2023-02-02,102.54550585069765,51.50061212486789,27.583739295661303
|
||||||
|
2023-02-03,96.21708851966602,44.85054252180392,30.494335310101455
|
||||||
|
2023-02-04,103.30012064507876,62.36978076916601,32.79852844272025
|
||||||
|
2023-02-05,107.76615313512,57.05267167915006,28.463490371410078
|
||||||
|
2023-02-06,118.1047284831108,48.92665130826737,33.07680141058322
|
||||||
|
2023-02-07,114.68453024304476,58.27453669536952,24.000399142943266
|
||||||
|
2023-02-08,109.79663046257247,51.589382907444296,22.980304943241066
|
||||||
|
2023-02-09,104.48969129818198,56.50701232307211,27.87355790833527
|
||||||
|
2023-02-10,101.54656654412297,46.81067957989798,29.142146095561642
|
||||||
|
2023-02-11,103.96500111652492,51.408818350927966,27.841614248180033
|
||||||
|
2023-02-12,112.01560574417775,58.53273926699116,21.687120936061277
|
||||||
|
2023-02-13,118.9828573044494,63.820204623075306,27.57183586136113
|
||||||
|
2023-02-14,117.29813462397733,52.64758527670978,23.87553622210353
|
||||||
|
2023-02-15,112.4994883398013,54.577237990695906,29.7477111138268
|
||||||
|
2023-02-16,104.70275042588428,49.97664865713736,28.78823694934582
|
||||||
|
2023-02-17,103.92903467881405,48.69787117653846,24.818551595791803
|
||||||
|
2023-02-18,106.28211128125639,61.968326842033676,26.046338946482383
|
||||||
|
2023-02-19,110.75852018598077,57.40416864779516,24.3600478765442
|
||||||
|
2023-02-20,122.12422779023196,54.75769412344075,27.299399894110135
|
||||||
|
2023-02-21,121.12891675425766,65.15376811240272,30.302612891151956
|
||||||
|
2023-02-22,114.06938782186724,67.64547489599678,24.4297565343602
|
||||||
|
2023-02-23,108.38021603048709,59.35243431799928,28.848822963638963
|
||||||
|
2023-02-24,105.62999092642217,45.21814563344102,25.695659305686696
|
||||||
|
2023-02-25,109.43524939303815,52.29645433218782,24.85756240773558
|
||||||
|
2023-02-26,114.6422754496737,63.65569347076997,26.864838568377532
|
||||||
|
2023-02-27,122.74145559432745,57.83238046906982,24.02995142470168
|
||||||
|
2023-02-28,124.1981979657437,64.31819612360299,25.424479219636847
|
||||||
|
2023-03-01,118.85647916163501,63.30140984796419,23.44154220163044
|
||||||
|
2023-03-02,107.73630868962128,49.233501986920224,32.879100021864744
|
||||||
|
2023-03-03,104.9579248974653,52.18133566842365,27.040464022749358
|
||||||
|
2023-03-04,107.34286166502932,37.4650941321693,24.785245586575005
|
||||||
|
2023-03-05,115.96259583523435,52.859359710945725,27.47611058647402
|
||||||
|
2023-03-06,126.86147718486289,62.167897835465645,26.44693544891746
|
||||||
|
2023-03-07,127.87752639288169,57.699847286616595,26.070759543108874
|
||||||
|
2023-03-08,118.24185044528706,67.2829817423017,28.525917185557415
|
||||||
|
2023-03-09,112.24465279387695,48.971619507135316,28.905688959287644
|
||||||
|
2023-03-10,107.82181320681322,51.71068416992169,24.991411130032738
|
||||||
|
2023-03-11,110.2529822211921,55.78019399702651,24.805208594648875
|
||||||
|
2023-03-12,121.11006570384127,67.76139929725122,25.657256968846575
|
||||||
|
2023-03-13,130.1816737833995,57.91152613580255,19.526397309813348
|
||||||
|
2023-03-14,126.71565911008061,69.1736483158151,21.836336361143037
|
||||||
|
2023-03-15,122.99418850292842,61.548259556562144,30.432281093790863
|
||||||
|
2023-03-16,106.54633650145081,48.365624995485646,31.21631017567973
|
||||||
|
2023-03-17,110.51968335255634,57.57035904759452,25.484047660225336
|
||||||
|
2023-03-18,113.70966939484823,57.85013317529147,27.910575411780364
|
||||||
|
2023-03-19,121.81927189435518,57.90855156138362,27.06440372996227
|
||||||
|
2023-03-20,129.15083837967413,64.92442961478716,35.317044435415966
|
||||||
|
2023-03-21,124.42743772967184,60.287150880527115,29.388875488072575
|
||||||
|
2023-03-22,120.90336183411937,61.01926764331592,25.596146723045138
|
||||||
|
2023-03-23,114.05377092727564,60.33753883624305,23.063026919404756
|
||||||
|
2023-03-24,113.61702757529353,64.73859786837353,21.060058024151907
|
||||||
|
2023-03-25,114.49586375810298,51.05885438491725,26.439536636244885
|
||||||
|
2023-03-26,122.827839869664,72.07908680811333,23.5098422365089
|
||||||
|
2023-03-27,129.81797713472835,55.148549569751665,21.461882087287382
|
||||||
|
2023-03-28,131.84175921611333,65.16195413287875,23.738673307071416
|
||||||
|
2023-03-29,123.47701234402419,64.68009220443497,22.383496692674402
|
||||||
|
2023-03-30,113.83938888817369,58.324653782762006,30.639314352453876
|
||||||
|
2023-03-31,113.48113806388008,53.62707143283707,28.172557461803127
|
||||||
|
2023-04-01,117.72767203318188,57.823224764804564,25.453469010723516
|
||||||
|
2023-04-02,128.40696958856444,61.73848012098806,29.866968095062035
|
||||||
|
2023-04-03,131.26394087272118,62.685146651639535,25.60898934505341
|
||||||
|
2023-04-04,130.95724623585764,69.72663360303395,22.742780561844356
|
||||||
|
2023-04-05,123.51132541540858,63.540740137529525,29.845754141356707
|
||||||
|
2023-04-06,113.5370478780282,53.30397596271083,26.842860784320308
|
||||||
|
2023-04-07,114.84818220676686,61.922090480549684,22.064140934005557
|
||||||
|
2023-04-08,120.06082896926581,61.56691208901595,24.554612106452694
|
||||||
|
2023-04-09,128.50185694254756,68.31523906546857,22.44852212426784
|
||||||
|
2023-04-10,134.03773943222305,70.16701392572959,20.876726435247697
|
||||||
|
2023-04-11,130.37680723958826,61.04342856518377,27.753407014454222
|
||||||
|
2023-04-12,124.92973736296452,59.66396348049741,30.652873036988282
|
||||||
|
2023-04-13,117.34977804457614,62.41135704790438,20.67866913783906
|
||||||
|
2023-04-14,114.46066606640413,60.28218436036939,26.51302831308679
|
||||||
|
2023-04-15,121.2252388983924,60.508111479375465,22.821941639368188
|
||||||
|
2023-04-16,131.31856822823775,66.24592103066279,23.262241939158173
|
||||||
|
2023-04-17,140.11039848581893,76.44352372185159,22.89618506145425
|
||||||
|
2023-04-18,135.1451770527603,64.61473158220099,22.03114326885
|
||||||
|
2023-04-19,127.76129781335801,66.6161358125292,24.718429205442522
|
||||||
|
2023-04-20,119.46355643368209,58.720814954671575,22.029762716093522
|
||||||
|
2023-04-21,114.0448569861147,55.93402247692124,25.28373228638474
|
||||||
|
2023-04-22,123.50756339829785,67.24766595908487,24.271396224416407
|
||||||
|
2023-04-23,132.64644016382874,70.45030182685451,23.655015155883184
|
||||||
|
2023-04-24,143.0878120259834,75.61145419299488,21.598917054076214
|
||||||
|
2023-04-25,135.9934038087475,74.5240959401454,22.5410427922146
|
||||||
|
2023-04-26,129.32437177871165,64.76720509751962,26.48727920511546
|
||||||
|
2023-04-27,121.12652393740836,63.973026825179005,25.673605834229928
|
||||||
|
2023-04-28,117.37007501648283,57.13370372527413,21.187937280679726
|
||||||
|
2023-04-29,127.86250209122531,65.55208280805486,24.368348675081645
|
||||||
|
2023-04-30,136.04182913809416,67.37019929720866,26.27426187262793
|
||||||
|
2023-05-01,141.55882902128073,71.26439433560395,18.96163340286704
|
||||||
|
2023-05-02,136.13522897297034,71.04339961366973,25.549678567089554
|
||||||
|
2023-05-03,133.0020851904919,62.409939179078584,21.881475456830803
|
||||||
|
2023-05-04,119.98214368927341,70.453008223064,25.530891483166414
|
||||||
|
2023-05-05,122.71397649985045,56.326901342426716,21.479066751477976
|
||||||
|
2023-05-06,131.97731026240027,59.917712067261476,18.303946662830565
|
||||||
|
2023-05-07,134.56514004748436,73.07312439124252,18.785714394893226
|
||||||
|
2023-05-08,140.65169760802368,74.28416227382652,23.762345292245453
|
||||||
|
2023-05-09,139.72310582696744,72.9821519987445,24.347006701144345
|
||||||
|
2023-05-10,130.66513240891535,68.47429375077908,20.80463806438527
|
||||||
|
2023-05-11,121.28095236867787,60.57924232010436,25.38311405974921
|
||||||
|
2023-05-12,123.51795943208879,57.272707858615234,18.4325252403288
|
||||||
|
2023-05-13,127.494401479568,64.12622353075263,23.16859477491232
|
||||||
|
2023-05-14,139.49771268252908,66.36304778370399,19.683534315285495
|
||||||
|
2023-05-15,141.74502392454568,75.74811062936159,21.31082333488498
|
||||||
|
2023-05-16,144.1875443211837,71.35848525308116,23.358276415959292
|
||||||
|
2023-05-17,131.58175992050872,61.6633939762918,20.584589049876787
|
||||||
|
2023-05-18,125.34125538337241,61.06369848342124,21.961911256757762
|
||||||
|
2023-05-19,126.85611321109774,65.49271387692698,26.084205060809154
|
||||||
|
2023-05-20,129.18274516107348,61.77274981651687,21.284399768314977
|
||||||
|
2023-05-21,141.00563068175677,66.39171336304622,25.47190045679844
|
||||||
|
2023-05-22,147.98975578319556,75.21331394905734,19.525452300348753
|
||||||
|
2023-05-23,139.4308115573891,70.94023863423816,24.453734141786047
|
||||||
|
2023-05-24,134.99454013588115,64.96255419108493,27.138776213732495
|
||||||
|
2023-05-25,128.11503428381528,61.70232561381602,15.34888559509552
|
||||||
|
2023-05-26,128.6485719341678,65.48453565387207,20.32288207278455
|
||||||
|
2023-05-27,131.19867759470725,58.358917089867006,24.394532964456197
|
||||||
|
2023-05-28,139.90565414086922,62.915508198551834,22.003929168504186
|
||||||
|
2023-05-29,148.20294343801243,70.50925061274404,23.676251690465687
|
||||||
|
2023-05-30,144.79224153467328,71.3288850087774,20.700607253922893
|
||||||
|
2023-05-31,136.6043139822718,69.85669481912592,22.72208092020764
|
||||||
|
2023-06-01,129.90496634749854,72.32926425849703,21.945028595331298
|
||||||
|
2023-06-02,127.58824815952354,68.08242219577187,25.865155230205552
|
||||||
|
2023-06-03,136.16761518850672,67.28411494443623,23.074820318848364
|
||||||
|
2023-06-04,145.1240516092851,72.4669447651291,23.274114518888926
|
||||||
|
2023-06-05,147.5059196939697,68.7403130237958,20.97542437801451
|
||||||
|
2023-06-06,149.47687307825956,74.64587085916783,20.697985347883023
|
||||||
|
2023-06-07,138.53032591367202,67.82186976223531,20.812878200360235
|
||||||
|
2023-06-08,128.45328941709883,65.84023751023986,23.243657934672576
|
||||||
|
2023-06-09,132.13221605087253,61.92995330767465,20.747096808795494
|
||||||
|
2023-06-10,135.78647805542306,70.48997159891739,22.829123565664112
|
||||||
|
2023-06-11,148.0987107210833,81.71304992555454,28.135750134629784
|
||||||
|
2023-06-12,153.0193346048766,75.96586656015401,24.47267059270714
|
||||||
|
2023-06-13,145.6457394335731,74.83142832728126,20.83097462962713
|
||||||
|
2023-06-14,140.9902449994371,73.94584245827411,25.36243573634108
|
||||||
|
2023-06-15,133.2924186394212,64.64010696028141,20.484316594503184
|
||||||
|
2023-06-16,134.34139003578426,68.29115742694421,15.54391785175287
|
||||||
|
2023-06-17,143.56414528086364,71.8450346443408,18.583781268252814
|
||||||
|
2023-06-18,148.01551228867737,74.49613663708284,15.945413181646051
|
||||||
|
2023-06-19,150.95414370388056,71.61202293266295,20.452997236318286
|
||||||
|
2023-06-20,147.0447580558963,73.64492989924287,21.51254156972946
|
||||||
|
2023-06-21,138.91442946215307,71.94720618730378,26.436347112705246
|
||||||
|
2023-06-22,133.9508505306588,74.23114330430461,22.337566040890476
|
||||||
|
2023-06-23,135.26498811523834,72.42884818804521,20.64923107688999
|
||||||
|
2023-06-24,142.36042104862796,81.94612281187176,23.744498150585642
|
||||||
|
2023-06-25,152.13733471736649,72.231929544243,14.572624223730113
|
||||||
|
2023-06-26,154.23904352473193,81.48112494596936,21.86262256879806
|
||||||
|
2023-06-27,153.26261953185252,77.54801979461801,23.418123219851857
|
||||||
|
2023-06-28,141.501240633759,81.69963498296786,16.619517644570024
|
||||||
|
2023-06-29,141.19092331014159,66.55397022829503,24.436287255248928
|
||||||
|
2023-06-30,137.72658495620402,64.66468326719813,21.970263091829977
|
||||||
|
2023-07-01,142.13074316454697,68.06840835505338,19.65865887136292
|
||||||
|
2023-07-02,150.31262022756084,64.53683149223139,22.752616955102773
|
||||||
|
2023-07-03,156.92136533513548,75.83190755916394,27.6160986739157
|
||||||
|
2023-07-04,151.43565463277307,71.92216400861804,21.29936760939659
|
||||||
|
2023-07-05,144.94522732135877,73.22458259306042,21.448179346840707
|
||||||
|
2023-07-06,138.3500162874156,70.88378802259359,19.27518363303756
|
||||||
|
2023-07-07,138.2292024966561,78.49545544440747,18.05348196698251
|
||||||
|
2023-07-08,144.19080360135797,76.84752099160923,23.04377126872821
|
||||||
|
2023-07-09,151.39073391880174,72.81084868108886,17.934261085087467
|
||||||
|
2023-07-10,156.7987408262441,73.90729705638027,20.66696001819084
|
||||||
|
2023-07-11,155.11785753935226,80.01852462720866,18.969037709955906
|
||||||
|
2023-07-12,145.4344726620079,66.11607029590074,21.788698271209025
|
||||||
|
2023-07-13,136.57252976954146,77.4435587140425,21.30249385354929
|
||||||
|
2023-07-14,140.6277628521662,76.21108202968954,23.363876114180734
|
||||||
|
2023-07-15,148.69544667183354,72.00184507539325,18.670955828561386
|
||||||
|
2023-07-16,154.61315597254045,68.74090534081584,19.34112896296411
|
||||||
|
2023-07-17,159.72655950094892,86.63264162130153,17.16421136521589
|
||||||
|
2023-07-18,155.0395982759213,76.94709991169756,18.71737147605307
|
||||||
|
2023-07-19,144.212007347891,78.2950852338128,21.131901479134555
|
||||||
|
38
docker-compose.yml
Normal file
38
docker-compose.yml
Normal file
@ -0,0 +1,38 @@
|
|||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
services:
|
||||||
|
app:
|
||||||
|
build:
|
||||||
|
context: .
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
container_name: lazy-fjh-app
|
||||||
|
ports:
|
||||||
|
- "60201:60201"
|
||||||
|
volumes:
|
||||||
|
- ./uploads:/app/uploads
|
||||||
|
- ./logs:/app/logs
|
||||||
|
- ./temp:/app/temp
|
||||||
|
env_file:
|
||||||
|
- .env
|
||||||
|
environment:
|
||||||
|
- ENV=production
|
||||||
|
- DEBUG=False
|
||||||
|
- HOST=0.0.0.0
|
||||||
|
- PORT=60201
|
||||||
|
- LOG_LEVEL=info
|
||||||
|
- LANGUAGE_DEFAULT=zh
|
||||||
|
- ANALYSIS_TIMEOUT=300
|
||||||
|
- MAX_MEMORY_MB=500
|
||||||
|
restart: always
|
||||||
|
networks:
|
||||||
|
- app-network
|
||||||
|
healthcheck:
|
||||||
|
test: [ "CMD", "curl", "-f", "http://localhost:60201/health" ]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 10s
|
||||||
|
|
||||||
|
networks:
|
||||||
|
app-network:
|
||||||
|
driver: bridge
|
||||||
26
docs/V1.1.md
Normal file
26
docs/V1.1.md
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
# 会话总结(V1.1)
|
||||||
|
|
||||||
|
## 近期问题与修复
|
||||||
|
- **内存限制**:发现 uvicorn 启动阶段因 RLIMIT_AS 2GB 触发 `MemoryError`。将内存限制改为可选,通过环境变量控制(默认不设限)。
|
||||||
|
- **pandas 频率大小写**:pandas 3.x 需小写频率,已将 `freq='H'/'S'` 改为 `freq='h'/'s'`,解决 `Invalid frequency` 报错。
|
||||||
|
|
||||||
|
## 接口现状(简)
|
||||||
|
- v1 `/api`: `upload`、`analyze`、`available_methods`、`image/{filename}`、`download/{filename}`、`list_uploads` 均已实现。
|
||||||
|
- v2 `/api/v2`: `analyze`(OSS/URL 输入)、`available_methods` 已实现;`API_MODE=v2` 可禁用 v1 上传/图片接口。
|
||||||
|
|
||||||
|
## 设计决策:前端渲染、数据模式
|
||||||
|
- 不再传 PNG,后端只返回结构化数据,前端用 ECharts 渲染。方案详见 `docs/charts-data-mode-plan.md`。
|
||||||
|
- 统一清洗:`to_echarts_safe` 处理 NaN/Inf/pd.NA、Timestamp → ISO8601、numpy/Decimal 转原生,防循环引用。
|
||||||
|
- 数据格式约定:
|
||||||
|
- 时间序列/多系列:优先 `dataset`;矩阵(相关性等)提前 flatten `[i,j,value]`。
|
||||||
|
- 直方图:后端 `np.histogram` 分箱,返回 `[range_start, range_end, count]`。
|
||||||
|
- 样式解耦:后端不返回颜色/线型。
|
||||||
|
- 算法保持不变,改动仅在结果封装/清洗;如需 CI/异常标注属于额外封装,不改核心算法。
|
||||||
|
|
||||||
|
## 待办(尚未落地)
|
||||||
|
- 落实 charts 数据模式:实现 `to_echarts_safe`、新增 `charts` 字段、禁用图片保存时的返回路径。
|
||||||
|
- 直方图分箱数据、异常点标注(若需要)、预测上下界(若需要)在封装层返回。
|
||||||
|
|
||||||
|
## 参考
|
||||||
|
- 详细方案:`docs/charts-data-mode-plan.md`
|
||||||
|
- 接口清单:`docs/api-endpoints-status.md`
|
||||||
21
docs/api-endpoints-status.md
Normal file
21
docs/api-endpoints-status.md
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
# API 接口清单与完成度
|
||||||
|
|
||||||
|
## v1 路由(前缀 `/api`)
|
||||||
|
| 接口 | 方法 | 说明 | 状态 | 备注 |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| `/api/upload` | POST | 上传 CSV 文件 | ✅ 已实现 | 仅接受 `settings.ALLOWED_EXTENSIONS`(默认 csv)。返回保存后的文件名。 |
|
||||||
|
| `/api/analyze` | POST | 对上传的 CSV 做完整分析 | ✅ 已实现 | 返回分析结果;已切换 charts 数据模式(`analysis.<lang>.charts`),`images` 保留为空以兼容旧前端。 |
|
||||||
|
| `/api/available_methods` | GET | 列出支持的分析方法 | ✅ 已实现 | 静态列表。 |
|
||||||
|
| `/api/image/{filename}` | GET | 获取图片文件 | ✅ 已实现 | 从 `uploads/` 读取。 |
|
||||||
|
| `/api/download/{filename}` | GET | 下载文件 | ✅ 已实现 | 从 `uploads/` 读取。 |
|
||||||
|
| `/api/list_uploads` | GET | 列出上传文件 | ✅ 已实现 | 返回文件名/大小/修改时间。 |
|
||||||
|
|
||||||
|
## v2 路由(前缀 `/api/v2`)
|
||||||
|
| 接口 | 方法 | 说明 | 状态 | 备注 |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| `/api/v2/analyze` | POST | 从 OSS/URL 下载 CSV 并分析 | ✅ 已实现 | 复用 v1 分析器;已返回 charts 数据模式,`images` 为空。`API_MODE=v2` 下仍禁用图片。 |
|
||||||
|
| `/api/v2/available_methods` | GET | 列出支持的分析方法 | ✅ 已实现 | 与 v1 相同。 |
|
||||||
|
|
||||||
|
## 已知差距 / 待办(尚未实现)
|
||||||
|
- **预测置信区间**:当前 VAR 仅返回点预测;如需 CI 需改用 `forecast_interval`(不改算法,只取上下界)。
|
||||||
|
- **异常点标注**:暂无标注输出;若需要,需要在封装层额外计算(max/min 或异常检测)。
|
||||||
71
docs/charts-data-mode-plan.md
Normal file
71
docs/charts-data-mode-plan.md
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
# charts 数据模式(现行版)
|
||||||
|
|
||||||
|
> 旧版文档已存档为 `docs/旧的charts-data-mode-plan.md`。
|
||||||
|
|
||||||
|
## 目标
|
||||||
|
- 后端返回结构化图表数据,前端用 ECharts 渲染;不再生成/传输图片。
|
||||||
|
- 统一清洗,避免 NaN/Inf/不可序列化对象导致接口崩溃。
|
||||||
|
- 响应结构以 `analysis.<lang>.charts` 为准;`images` 为空仅用于兼容旧前端。
|
||||||
|
|
||||||
|
## 序列化规范(to_echarts_safe 已实现)
|
||||||
|
- NaN/Inf/pd.NA → null;numpy 标量转原生类型;Decimal 转 float。
|
||||||
|
- Timestamp/datetime → ISO8601 字符串。
|
||||||
|
- ndarray/DataFrame/Series → list 或 records;递归清洗并防循环引用。
|
||||||
|
|
||||||
|
## 响应骨架(实际线上形态)
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"meta": { ... },
|
||||||
|
"analysis": {
|
||||||
|
"zh": {
|
||||||
|
"data_description": "...",
|
||||||
|
"preprocessing_steps": [...],
|
||||||
|
"api_analysis": { ... },
|
||||||
|
"steps": [
|
||||||
|
{ "key": "ts_img", "title": "Time Series Analysis", "chart": "ts", "summary": "..." },
|
||||||
|
...
|
||||||
|
],
|
||||||
|
"charts": {
|
||||||
|
"ts": { ... },
|
||||||
|
"acf_pacf": { ... },
|
||||||
|
...
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"images": {},
|
||||||
|
"log": [...]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- 顶层不再单独返回 `charts`;前端应读取 `analysis.<lang>.charts`。如需顶层别名,可在路由层追加映射。
|
||||||
|
- `steps[].chart` 指向 `charts` 中的 key,驱动前端展示顺序。
|
||||||
|
|
||||||
|
## 图表数据格式(按现实现状)
|
||||||
|
- **时间序列 ts**:`type: line`,`dataset = [[col...], [...]]`,含 timestamp 字符串。
|
||||||
|
- **ACF/PACF acf_pacf**:`series: [{name: col, acf:[{lag,value}], pacf:[{lag,value}]}]`,每个列打包在同一项。
|
||||||
|
- **平稳性 stationarity**:`records: [{column, ADF:{...}, KPSS:{...}}]`。
|
||||||
|
- **正态性 normality**:`records: [{column, histogram:[{range_start,range_end,count},...], Shapiro-Wilk:{...}, Jarque-Bera:{...}}]`。
|
||||||
|
- **季节分解 seasonal**:`type: line`,`dataset` 包含 observed/trend/seasonal/resid,缺失为 null。
|
||||||
|
- **频谱 spectral(已做摘要以控体积)**:
|
||||||
|
- `spectrogram`: `f`, `t`, `Sxx_log10_mean`, `Sxx_shape`;不返回完整矩阵。
|
||||||
|
- `periodogram`: `f` 与 `Pxx_den` 仅前 20 个点。
|
||||||
|
- **相关性 heatmap**:`type: heatmap`,`data` 为 `[i,j,value]` 扁平列表,含 xLabels/yLabels。
|
||||||
|
- **PCA 碎石 pca_scree**:`type: bar`,`dataset` 组件/解释度/累积值。
|
||||||
|
- **PCA 散点 pca_scatter**:`type: scatter`,`records` 含 PC1/PC2/timestamp。
|
||||||
|
- **特征重要性 feature_importance**:`type: bar`,`records` 含 feature/importance。
|
||||||
|
- **聚类 cluster**:`type: scatter`,`records` 含 cluster 与 timestamp。
|
||||||
|
- **因子分析 factor**:`type: scatter`,`records` 含 Factor1/Factor2 与 timestamp。
|
||||||
|
- **协整 cointegration**:`type: table`,`meta` 直接承载 trace_stat/crit_vals/eigen_vals。
|
||||||
|
- **VAR 预测 var_forecast**:`type: line`,`dataset` 含 step 与各 forecast 列。
|
||||||
|
|
||||||
|
## 兼容与注意
|
||||||
|
- `images` 为空对象;任何遗留的 `image_path` 已剔除。
|
||||||
|
- 当前频谱输出为“摘要版”,若要还原全量矩阵需调整 `perform_spectral_analysis`。
|
||||||
|
- ACF/PACF 结构与旧文档不同,前端需按现状解码;若要拆分 series,可在后端调整 `_build_chart_payload`。
|
||||||
|
- 正态性直方图已由后端分箱,无需前端再分箱。
|
||||||
|
|
||||||
|
## 已知可选改进
|
||||||
|
1) 路由层增加顶层 `charts` 别名,便于前端迁移。
|
||||||
|
2) ACF/PACF 改为每列拆两条 series(acf/pacf),与旧示例一致。
|
||||||
|
3) 为 spectral 增加 `mode=full|summary` 开关,前端可选取全量或摘要。
|
||||||
|
4) 对大体积 dataset 增加可选抽样/截断策略。
|
||||||
32
docs/关于charts模式的实现.md
Normal file
32
docs/关于charts模式的实现.md
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
# charts 模式实现说明(现行版)
|
||||||
|
|
||||||
|
> 旧版实现在 `docs/旧的关于charts模式的实现.md`。
|
||||||
|
|
||||||
|
## 现状概览
|
||||||
|
- 后端强制 `generate_plots=False`,所有步骤只产出数据,`analysis.<lang>.charts` 收口。
|
||||||
|
- `images` 为空对象,保留兼容;`steps[].chart` 绑定对应图表 key。
|
||||||
|
- 清洗函数 `to_echarts_safe` 递归处理 NaN/Inf/Timestamp/numpy/Decimal,确保 JSON-safe。
|
||||||
|
|
||||||
|
## 关键结构
|
||||||
|
- 响应:`analysis.<lang>.charts`(顶层未暴露 charts)。
|
||||||
|
- 时间序列、季节分解、VAR 等使用 dataset;相关性使用扁平 heatmap;PCA/聚类/因子用 records。
|
||||||
|
- ACF/PACF:每个列含 `acf`/`pacf` 两个序列(与旧文档拆成两条 series 不同)。
|
||||||
|
- 正态性:每列包含 histogram 分箱(后端 `np.histogram`),加 Shapiro/JB 结果。
|
||||||
|
- 频谱:当前为摘要版(spectrogram 只给均值+shape,periodogram 仅前 20 点)。
|
||||||
|
|
||||||
|
## 文件与代码映射
|
||||||
|
- 清洗与汇总:`app/services/analysis_system.py`(`to_echarts_safe`、`_build_chart_payload`、`run_analysis`)。
|
||||||
|
- 时序数据:`app/services/analysis/modules/time_series.py`(数据-only,频谱摘要版)。
|
||||||
|
- 正态性分箱:`app/services/analysis/modules/basic.py`。
|
||||||
|
- 路由返回:`app/api/routes/analysis.py`、`app/api/routes/analysis_v2.py`(`charts` 位于 `analysis.<lang>`)。
|
||||||
|
|
||||||
|
## 与旧版差异
|
||||||
|
- 不再生成图片;顶层不提供 `charts` 字段。
|
||||||
|
- ACF/PACF 结构改变;频谱从全量矩阵切换为摘要版。
|
||||||
|
- 正态性直方图格式为字典字段而非二维数组。
|
||||||
|
|
||||||
|
## 后续可选改进
|
||||||
|
1) 路由层增加顶层 `charts` 别名,便于前端无感迁移。
|
||||||
|
2) ACF/PACF 输出可改为拆分 series(与旧版示例一致)。
|
||||||
|
3) 频谱提供 `full/summary` 开关,允许返回完整矩阵或摘要。
|
||||||
|
4) 为大数据集增加抽样/截断策略,防止超大 payload。
|
||||||
125
docs/旧的charts-data-mode-plan.md
Normal file
125
docs/旧的charts-data-mode-plan.md
Normal file
@ -0,0 +1,125 @@
|
|||||||
|
# 前后端分离的可视化数据返回方案(ECharts)
|
||||||
|
|
||||||
|
## 目标
|
||||||
|
- 后端不再生成/传输图片,仅返回图表数据;前端使用 ECharts 渲染。
|
||||||
|
- 统一的数据结构,减少前端适配代码;杜绝 NaN/Infinity/不可序列化对象导致的 API 崩溃。
|
||||||
|
- 与现有 API 保持兼容(`images` 可留空),逐步迁移到 `charts` 数据模式。
|
||||||
|
|
||||||
|
## 序列化与清洗规范(必须遵守)
|
||||||
|
- **NaN / Infinity / pd.NA**:递归清洗为 `null`(JSON `null`)。不得返回字符串 "NaN"。
|
||||||
|
- **时间戳**:统一 ISO8601 字符串(例 `2023-01-01T12:00:00`)。
|
||||||
|
- **数组/矩阵**:全部转为原生 Python list,再 JSON 序列化。
|
||||||
|
- **DataFrame**:优先 `to_dict(orient="records")` 或组装 dataset 形式(见下)。
|
||||||
|
- **数值类型**:numpy 标量转原生 `int/float/bool`;遇到 `nan/inf` 先清洗。
|
||||||
|
|
||||||
|
> 建议实现一个通用函数 `to_echarts_safe(obj)`,递归处理上述清洗与类型转换,所有响应数据出站前统一走这一层。
|
||||||
|
|
||||||
|
## 响应骨架(新增 `charts`,旧字段保持)
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"meta": { ... },
|
||||||
|
"analysis": {
|
||||||
|
"zh": {
|
||||||
|
"data_description": "...",
|
||||||
|
"preprocessing_steps": [ ... ],
|
||||||
|
"api_analysis": { ... },
|
||||||
|
"steps": [
|
||||||
|
{"key": "ts", "title": "Time Series", "summary": "...", "chart": "ts"},
|
||||||
|
...
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"charts": {
|
||||||
|
"ts": { "type": "line", "dataset": [...], "meta": {...} },
|
||||||
|
"acf_pacf": { "type": "bar", "series": [...], "meta": {...} },
|
||||||
|
"heatmap": { "type": "heatmap", "data": [...], "xLabels": [...], "yLabels": [...], "meta": {...} },
|
||||||
|
...
|
||||||
|
},
|
||||||
|
"images": {},
|
||||||
|
"log": [...]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- `steps[].chart` 指向 `charts` 的 key,前端可按步骤顺序渲染。
|
||||||
|
- `images` 保留但为空,兼容旧前端。
|
||||||
|
|
||||||
|
## 各图表建议的数据格式(贴合 ECharts)
|
||||||
|
- **时间序列(ts)**:`dataset` 形式
|
||||||
|
- 二维数组:首行表头,例如 `[ ["timestamp","sales","ad_cost"], ["2023-01-01T00:00:00", 10, 5], ... ]`
|
||||||
|
- 前端:`dataset.source = dataset`,`series: [{type:'line', encode:{x:'timestamp', y:'sales'}}, ...]`
|
||||||
|
|
||||||
|
- **ACF / PACF(acf_pacf)**:
|
||||||
|
- `{ series: [{name:'acf', data:[{lag:0, value:1.0}, ...]}, {name:'pacf', data:[...] }], meta:{column:'sales'} }`
|
||||||
|
|
||||||
|
- **平稳性检验(stationarity)**:
|
||||||
|
- `{ adf: {statistic:..., p_value:..., critical_values:{...}}, kpss:{...}, meta:{column:'sales'} }`
|
||||||
|
- 前端可渲染 bar/表格。
|
||||||
|
|
||||||
|
- **正态性检验(normality)**:
|
||||||
|
- `{ columns: [{name:'col', shapiro_p:..., jb_p:..., shapiro_stat:..., jb_stat:...}], meta:{} }`
|
||||||
|
|
||||||
|
- **季节性分解(seasonal)**:
|
||||||
|
- `dataset` 形式:`[["timestamp","observed","trend","seasonal","resid"], [...]]`
|
||||||
|
|
||||||
|
- **频谱分析(spectral)**:
|
||||||
|
- `periodogram`: `{ f: [...], psd: [...] }`(可截断前 N 点)
|
||||||
|
- `spectrogram`: `{ f: [...], t: [...], values: [[i,j,val], ...] }`(可只返回 log10 后再压缩)
|
||||||
|
|
||||||
|
- **相关性热力图(heatmap)**:
|
||||||
|
- `{ data: [[i,j,value], ...], xLabels:[...], yLabels:[...], meta:{} }`(后端提前 flatten N×N 矩阵)
|
||||||
|
|
||||||
|
- **PCA 碎石图(pca_scree)**:
|
||||||
|
- `dataset`: `[["component","explained","cumulative"], [1,0.4,0.4], ...]`
|
||||||
|
|
||||||
|
- **PCA 散点(pca_scatter)**:
|
||||||
|
- `records`: `[{pc1:..., pc2:..., timestamp:"..."}, ...]`
|
||||||
|
|
||||||
|
- **特征重要性(feature_importance)**:
|
||||||
|
- `records`: `[{feature:"...", importance:0.12}, ...]`
|
||||||
|
|
||||||
|
- **聚类(cluster)**:
|
||||||
|
- `records`: `[{timestamp:"...", cluster:0, x:<可选>, y:<可选>}...]`
|
||||||
|
|
||||||
|
- **因子分析(factor)**:
|
||||||
|
- 类似聚类:`[{timestamp:"...", factor1:..., factor2:...}]`
|
||||||
|
|
||||||
|
- **协整检验(cointegration)**:
|
||||||
|
- `{ trace_stat:[...], crit_95:[...], eigen_vals:[...], meta:{} }`
|
||||||
|
|
||||||
|
- **VAR 预测(var_forecast)**:
|
||||||
|
- `dataset`: `[["step","var1_forecast","var2_forecast"], [1, ...], ...]`
|
||||||
|
|
||||||
|
> 原则:能用 `dataset` 就用 `dataset`,多条线在前端通过 `encode` 指定;需要矩阵的提前 flatten;其余用 records。
|
||||||
|
|
||||||
|
## 样式与主题
|
||||||
|
- 后端不返回颜色、线型等视觉样式;仅返回语义字段(series 名称、指标含义)。前端根据主题决定配色与风格。
|
||||||
|
|
||||||
|
## 实施步骤(建议)
|
||||||
|
1) 增加 `to_echarts_safe` 清洗函数,统一处理 NaN/Infinity/Timestamp/DataFrame -> JSON-safe。
|
||||||
|
2) 在各分析函数里:保留计算逻辑,改为组装 chart data(dataset/records/flatten),不再生成 PNG;`generate_plots` 逻辑可留作开关,但默认 False。
|
||||||
|
3) `run_analysis` 汇总时,将各 step 的数据填入 `charts`,在 `steps` 内写入 `chart` key(引用图表)。
|
||||||
|
4) 路由层返回 `charts` 字段,`images` 留空,`steps` 仍返回。
|
||||||
|
5) 前端按 `charts` 协议接入 ECharts,去掉对 `images` 的依赖。
|
||||||
|
|
||||||
|
## 兼容与回退
|
||||||
|
- 旧前端:仍可拿到 `analysis.steps` 及 `images`(为空)。
|
||||||
|
- 新前端:使用 `charts`。如果某一步失败,返回 `{error:"..."}` 和简短 summary,避免 500。
|
||||||
|
|
||||||
|
## 性能注意
|
||||||
|
- 后端不画图,CPU/IO 显著下降;如需进一步优化,可让前端传 `methods` 列表决定执行哪些步骤。
|
||||||
|
|
||||||
|
## 算法是否需要改动?
|
||||||
|
- 核心统计/时序算法(ADF/KPSS、ACF/PACF、PCA、VAR、季节分解、相关矩阵、聚类等)保持不变,改动集中在“结果封装”层。
|
||||||
|
- 需要调整的只是输出包装:
|
||||||
|
- 将现有用于绘图的中间结果(DataFrame/ndarray/statsmodels 结果)转换为 ECharts 友好的 JSON 结构,统一经过 `to_echarts_safe` 清洗(NaN/Inf/Timestamp)。
|
||||||
|
- 矩阵类结果(如相关性)在后端提前 flatten 成 `[i,j,value]` 列表;dataset 形式优先用于多系列折线/柱状。
|
||||||
|
- 可按需做截断/摘要以控体积(如 periodogram 取前 N 点,spectrogram 取均值或下采样)。
|
||||||
|
- 补充元信息(列名/单位/变量名),方便前端生成 legend/tooltip。
|
||||||
|
- 不需改动的部分:
|
||||||
|
- 预处理、标准化流程、算法的数学实现与参数选择(滞后阶、分解周期、PCA 组件数等)保持现状。
|
||||||
|
- 如后续发现数据量过大或性能瓶颈,可再对个别步骤做抽样/截断,但不影响算法正确性。
|
||||||
|
|
||||||
|
## 追加约定(仍然不改算法,只改结果包装)
|
||||||
|
- **直方图分箱**:正态性/分布分析中,后端负责 binning(`np.histogram`),返回 `[["range_start","range_end","count"], ...]`。前端不做分箱。
|
||||||
|
- **to_echarts_safe 扩展**:除 NaN/Inf/Timestamp 外,显式处理 numpy 各数值类型、Decimal,必要时加“已访问集合”防循环引用。统一输出 JSON-safe、ECharts-friendly 结构。
|
||||||
|
- **矩阵/多系列格式**:矩阵类(相关性等)继续 flatten `[i,j,value]`;多系列/多列数据优先用 dataset+encode,保证对齐。
|
||||||
45
docs/旧的关于charts模式的实现.md
Normal file
45
docs/旧的关于charts模式的实现.md
Normal file
@ -0,0 +1,45 @@
|
|||||||
|
# 关于 charts 模式的实现
|
||||||
|
|
||||||
|
## 目标与范围
|
||||||
|
- 按 `docs/charts-data-mode-plan.md` 将后端改为返回结构化图表数据(ECharts 友好),不再生成/返回图片。
|
||||||
|
- 保持算法与分析流程不变,仅调整封装与响应结构;旧前端通过空的 `images` 字段保持兼容。
|
||||||
|
|
||||||
|
## 核心实现
|
||||||
|
- **统一清洗函数**:在 `app/services/analysis_system.py` 增加 `to_echarts_safe`,递归处理 NaN/Inf/pd.NA、numpy 标量/数组、Timestamp/datetime、Decimal,带循环引用保护,输出 JSON-safe 结构。
|
||||||
|
- **分析流程改造**:
|
||||||
|
- 在 `run_analysis` 内强制 `generate_plots=False`,改用 `charts` 收集每步结果,`steps[].chart` 指向对应 key。
|
||||||
|
- 为每个步骤新增 `chart_key`,映射到 `charts`:
|
||||||
|
- `stats`(统计概览 dataset 表格)、`ts`(时间序列 dataset)、`acf_pacf`(acf/pacf 序列)、`stationarity`、`normality`(表格)、`seasonal`、`spectral`、`heatmap`(相关矩阵 flatten)、`pca_scree`、`pca_scatter`、`feature_importance`、`cluster`、`factor`(records)、`cointegration`(表格 meta)、`var_forecast`(forecast dataset,含 step 列)。
|
||||||
|
- `_build_chart_payload` 依据 chart_key 组装 ECharts 友好的 dataset/records/flatten 结构,并通过 `to_echarts_safe` 清洗。
|
||||||
|
- 移除 fallback 图片生成,仅保留文字 fallback 分析。
|
||||||
|
- **数据层改动**:
|
||||||
|
- 正态性检验在 `app/services/analysis/modules/basic.py` 内增加直方图分箱:`np.histogram` 返回 `[range_start, range_end, count]` 列表,便于前端直接渲染。
|
||||||
|
|
||||||
|
## 路由响应调整
|
||||||
|
- v1 `POST /api/analyze` 与 v2 `POST /api/v2/analyze`:
|
||||||
|
- `analysis.<lang>.charts` 返回各图表数据;`steps` 保留顺序与摘要,并携带 `chart` 引用。
|
||||||
|
- `images` 始终为空对象,仅为兼容旧前端;删除旧的图片复制/保存逻辑,并剔除 `image_path` 泄露。
|
||||||
|
|
||||||
|
## 兼容性与注意事项
|
||||||
|
- 核心算法、预处理、API 分析调用保持原样;仅输出封装变化。
|
||||||
|
- 如果前端仍使用旧版,需要改为读取 `analysis.<lang>.charts` 与 `steps[].chart`。旧字段(images)为空不会报错。
|
||||||
|
- 大型数据仍需关注内存占用;如需进一步压缩,可在 `_build_chart_payload` 中添加截断/抽样。
|
||||||
|
|
||||||
|
## 相关文件
|
||||||
|
- 实现细节:`app/services/analysis_system.py`
|
||||||
|
- 直方图分箱:`app/services/analysis/modules/basic.py`
|
||||||
|
- 路由返回:`app/api/routes/analysis.py`、`app/api/routes/analysis_v2.py`
|
||||||
|
- 设计说明:`docs/charts-data-mode-plan.md`
|
||||||
|
|
||||||
|
## 2026-01-29 补充
|
||||||
|
- ACF/PACF 输出改为按 `lag/value` 的 records,便于前端直接做 bar/line 映射。
|
||||||
|
- 频谱输出:
|
||||||
|
- `spectrogram` 增加降采样并返回 `values: [i,j,val]`,附 `f`、`t` 列表。
|
||||||
|
- `periodogram` 返回 dataset 形式 `["f","psd"]`(截断前 200 点)。
|
||||||
|
- `docs/api-endpoints-status.md` 已更新状态,标记 charts 模式落地,`images` 为空仅兼容。
|
||||||
|
|
||||||
|
## 2026-01-29 后续调整
|
||||||
|
- 应需求取消频谱降采样:`spectrogram` 现返回全量 `f/t` 与全部 `values[i,j,val]`,`periodogram` 返回全量频点 dataset(可能显著增大 payload;如需再控体积,可重新引入上限或抽样)。
|
||||||
|
|
||||||
|
## 2026-01-29 再次更新
|
||||||
|
- time_series 模块回归“只返数据不生成图片”:时间序列、ACF/PACF、季节分解、频谱均不再绘图,直接返回 charts 所需数据;频谱依旧不降采样,返回全量值。
|
||||||
28
generate_openapi.py
Normal file
28
generate_openapi.py
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Add project root to path to ensure imports work
|
||||||
|
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
|
||||||
|
try:
|
||||||
|
from app.main import app
|
||||||
|
print("Successfully imported FastAPI app.")
|
||||||
|
except ImportError as e:
|
||||||
|
print(f"Error importing app: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
def generate_openapi_json():
|
||||||
|
openapi_schema = app.openapi()
|
||||||
|
|
||||||
|
output_path = Path("openapi.json")
|
||||||
|
|
||||||
|
with open(output_path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(openapi_schema, f, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
print(f"OpenAPI documentation generated at: {output_path.absolute()}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
generate_openapi_json()
|
||||||
29
generate_test_data.py
Normal file
29
generate_test_data.py
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# 设置随机种子
|
||||||
|
np.random.seed(42)
|
||||||
|
|
||||||
|
# 生成 200 天的时间序列
|
||||||
|
dates = pd.date_range(start='2023-01-01', periods=200, freq='D')
|
||||||
|
|
||||||
|
# 构造数据
|
||||||
|
trend = np.linspace(0, 50, 200)
|
||||||
|
seasonality = 10 * np.sin(np.linspace(0, 3.14 * 2 * (200/7), 200))
|
||||||
|
noise = np.random.normal(0, 2, 200)
|
||||||
|
|
||||||
|
sales = 100 + trend + seasonality + noise
|
||||||
|
ad_cost = sales * 0.5 + np.random.normal(0, 5, 200)
|
||||||
|
temperature = 30 - trend * 0.2 + np.random.normal(0, 3, 200)
|
||||||
|
|
||||||
|
# 创建 DataFrame
|
||||||
|
df = pd.DataFrame({
|
||||||
|
'date': dates,
|
||||||
|
'sales': sales,
|
||||||
|
'ad_cost': ad_cost,
|
||||||
|
'temperature': temperature
|
||||||
|
})
|
||||||
|
|
||||||
|
# 保存
|
||||||
|
df.to_csv('complex_test.csv', index=False)
|
||||||
|
print("✅ 成功生成测试文件: complex_test.csv")
|
||||||
552
openapi.json
Normal file
552
openapi.json
Normal file
@ -0,0 +1,552 @@
|
|||||||
|
{
|
||||||
|
"openapi": "3.1.0",
|
||||||
|
"info": {
|
||||||
|
"title": "时间序列数据分析系统",
|
||||||
|
"description": "支持多格式数据上传、AI增强分析、多语言报告生成",
|
||||||
|
"version": "2.0.0"
|
||||||
|
},
|
||||||
|
"paths": {
|
||||||
|
"/api/upload": {
|
||||||
|
"post": {
|
||||||
|
"tags": [
|
||||||
|
"upload"
|
||||||
|
],
|
||||||
|
"summary": "上传CSV或图片文件",
|
||||||
|
"description": "上传数据文件(CSV 或图片)\n\n- **file**: CSV 或图片文件 (PNG, JPG, BMP, TIFF)\n- **task_description**: 分析任务描述",
|
||||||
|
"operationId": "upload_file_api_upload_post",
|
||||||
|
"requestBody": {
|
||||||
|
"content": {
|
||||||
|
"multipart/form-data": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/Body_upload_file_api_upload_post"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"required": true
|
||||||
|
},
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/UploadResponse"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"422": {
|
||||||
|
"description": "Validation Error",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/HTTPValidationError"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/available_methods": {
|
||||||
|
"get": {
|
||||||
|
"tags": [
|
||||||
|
"analysis"
|
||||||
|
],
|
||||||
|
"summary": "获取可用的分析方法",
|
||||||
|
"description": "获取所有可用的分析方法",
|
||||||
|
"operationId": "get_available_methods_api_available_methods_get",
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"additionalProperties": true,
|
||||||
|
"type": "object",
|
||||||
|
"title": "Response Get Available Methods Api Available Methods Get"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/analyze": {
|
||||||
|
"post": {
|
||||||
|
"tags": [
|
||||||
|
"analysis"
|
||||||
|
],
|
||||||
|
"summary": "执行完整分析",
|
||||||
|
"description": "执行完整的时间序列分析\n\n流程:\n1. 加载并预处理数据\n2. 执行15种分析方法\n3. 调用AI API 进行深度分析\n4. 生成PDF/PPT/HTML报告",
|
||||||
|
"operationId": "analyze_data_api_analyze_post",
|
||||||
|
"requestBody": {
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/AnalysisRequest"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"required": true
|
||||||
|
},
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"additionalProperties": true,
|
||||||
|
"type": "object",
|
||||||
|
"title": "Response Analyze Data Api Analyze Post"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"422": {
|
||||||
|
"description": "Validation Error",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/HTTPValidationError"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/v2/available_methods": {
|
||||||
|
"get": {
|
||||||
|
"tags": [
|
||||||
|
"analysis-v2"
|
||||||
|
],
|
||||||
|
"summary": "获取可用的分析方法(v2)",
|
||||||
|
"description": "v2 版本:返回与 v1 相同的可用分析方法列表。",
|
||||||
|
"operationId": "get_available_methods_v2_api_v2_available_methods_get",
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"additionalProperties": true,
|
||||||
|
"type": "object",
|
||||||
|
"title": "Response Get Available Methods V2 Api V2 Available Methods Get"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/v2/analyze": {
|
||||||
|
"post": {
|
||||||
|
"tags": [
|
||||||
|
"analysis-v2"
|
||||||
|
],
|
||||||
|
"summary": "执行完整分析(v2:从 OSS URL 读取 CSV)",
|
||||||
|
"description": "Analyze CSV from an OSS/URL, returning the same structure as v1.",
|
||||||
|
"operationId": "analyze_data_v2_api_v2_analyze_post",
|
||||||
|
"requestBody": {
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/AnalysisV2Request"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"required": true
|
||||||
|
},
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"additionalProperties": true,
|
||||||
|
"type": "object",
|
||||||
|
"title": "Response Analyze Data V2 Api V2 Analyze Post"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"422": {
|
||||||
|
"description": "Validation Error",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/HTTPValidationError"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/image/{filename}": {
|
||||||
|
"get": {
|
||||||
|
"tags": [
|
||||||
|
"files"
|
||||||
|
],
|
||||||
|
"summary": "获取图片文件",
|
||||||
|
"description": "获取可视化图片文件",
|
||||||
|
"operationId": "serve_image_api_image__filename__get",
|
||||||
|
"parameters": [
|
||||||
|
{
|
||||||
|
"name": "filename",
|
||||||
|
"in": "path",
|
||||||
|
"required": true,
|
||||||
|
"schema": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Filename"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"422": {
|
||||||
|
"description": "Validation Error",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/HTTPValidationError"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/download/{filename}": {
|
||||||
|
"get": {
|
||||||
|
"tags": [
|
||||||
|
"files"
|
||||||
|
],
|
||||||
|
"summary": "下载文件",
|
||||||
|
"description": "下载报告或其他文件",
|
||||||
|
"operationId": "download_file_api_download__filename__get",
|
||||||
|
"parameters": [
|
||||||
|
{
|
||||||
|
"name": "filename",
|
||||||
|
"in": "path",
|
||||||
|
"required": true,
|
||||||
|
"schema": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Filename"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"422": {
|
||||||
|
"description": "Validation Error",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {
|
||||||
|
"$ref": "#/components/schemas/HTTPValidationError"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/list_uploads": {
|
||||||
|
"get": {
|
||||||
|
"tags": [
|
||||||
|
"files"
|
||||||
|
],
|
||||||
|
"summary": "列出上传的文件",
|
||||||
|
"description": "列出 uploads 目录中的文件",
|
||||||
|
"operationId": "list_uploads_api_list_uploads_get",
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/": {
|
||||||
|
"get": {
|
||||||
|
"summary": "Root",
|
||||||
|
"description": "根路径",
|
||||||
|
"operationId": "root__get",
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/health": {
|
||||||
|
"get": {
|
||||||
|
"summary": "Health",
|
||||||
|
"description": "健康检查",
|
||||||
|
"operationId": "health_health_get",
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"/api/config": {
|
||||||
|
"get": {
|
||||||
|
"summary": "Get Config",
|
||||||
|
"description": "获取应用配置",
|
||||||
|
"operationId": "get_config_api_config_get",
|
||||||
|
"responses": {
|
||||||
|
"200": {
|
||||||
|
"description": "Successful Response",
|
||||||
|
"content": {
|
||||||
|
"application/json": {
|
||||||
|
"schema": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"components": {
|
||||||
|
"schemas": {
|
||||||
|
"AnalysisRequest": {
|
||||||
|
"properties": {
|
||||||
|
"filename": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Filename"
|
||||||
|
},
|
||||||
|
"file_type": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "File Type",
|
||||||
|
"default": "csv"
|
||||||
|
},
|
||||||
|
"task_description": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Task Description",
|
||||||
|
"default": "时间序列数据分析"
|
||||||
|
},
|
||||||
|
"data_background": {
|
||||||
|
"additionalProperties": true,
|
||||||
|
"type": "object",
|
||||||
|
"title": "Data Background",
|
||||||
|
"default": {}
|
||||||
|
},
|
||||||
|
"original_image": {
|
||||||
|
"anyOf": [
|
||||||
|
{
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "null"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"title": "Original Image"
|
||||||
|
},
|
||||||
|
"language": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Language",
|
||||||
|
"default": "zh"
|
||||||
|
},
|
||||||
|
"generate_plots": {
|
||||||
|
"type": "boolean",
|
||||||
|
"title": "Generate Plots",
|
||||||
|
"default": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"type": "object",
|
||||||
|
"required": [
|
||||||
|
"filename"
|
||||||
|
],
|
||||||
|
"title": "AnalysisRequest",
|
||||||
|
"description": "分析请求模型"
|
||||||
|
},
|
||||||
|
"AnalysisV2Request": {
|
||||||
|
"properties": {
|
||||||
|
"oss_url": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Oss Url"
|
||||||
|
},
|
||||||
|
"task_description": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Task Description",
|
||||||
|
"default": "时间序列数据分析"
|
||||||
|
},
|
||||||
|
"data_background": {
|
||||||
|
"additionalProperties": true,
|
||||||
|
"type": "object",
|
||||||
|
"title": "Data Background",
|
||||||
|
"default": {}
|
||||||
|
},
|
||||||
|
"language": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Language",
|
||||||
|
"default": "zh"
|
||||||
|
},
|
||||||
|
"generate_plots": {
|
||||||
|
"type": "boolean",
|
||||||
|
"title": "Generate Plots",
|
||||||
|
"default": false
|
||||||
|
},
|
||||||
|
"source_name": {
|
||||||
|
"anyOf": [
|
||||||
|
{
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "null"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"title": "Source Name"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"type": "object",
|
||||||
|
"required": [
|
||||||
|
"oss_url"
|
||||||
|
],
|
||||||
|
"title": "AnalysisV2Request",
|
||||||
|
"description": "v2 分析请求模型(输入为 OSS/URL)"
|
||||||
|
},
|
||||||
|
"Body_upload_file_api_upload_post": {
|
||||||
|
"properties": {
|
||||||
|
"file": {
|
||||||
|
"type": "string",
|
||||||
|
"format": "binary",
|
||||||
|
"title": "File"
|
||||||
|
},
|
||||||
|
"task_description": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Task Description",
|
||||||
|
"default": "时间序列数据分析"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"type": "object",
|
||||||
|
"required": [
|
||||||
|
"file"
|
||||||
|
],
|
||||||
|
"title": "Body_upload_file_api_upload_post"
|
||||||
|
},
|
||||||
|
"HTTPValidationError": {
|
||||||
|
"properties": {
|
||||||
|
"detail": {
|
||||||
|
"items": {
|
||||||
|
"$ref": "#/components/schemas/ValidationError"
|
||||||
|
},
|
||||||
|
"type": "array",
|
||||||
|
"title": "Detail"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"type": "object",
|
||||||
|
"title": "HTTPValidationError"
|
||||||
|
},
|
||||||
|
"UploadResponse": {
|
||||||
|
"properties": {
|
||||||
|
"success": {
|
||||||
|
"type": "boolean",
|
||||||
|
"title": "Success"
|
||||||
|
},
|
||||||
|
"filename": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Filename"
|
||||||
|
},
|
||||||
|
"file_type": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "File Type"
|
||||||
|
},
|
||||||
|
"original_filename": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Original Filename"
|
||||||
|
},
|
||||||
|
"task_description": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Task Description"
|
||||||
|
},
|
||||||
|
"message": {
|
||||||
|
"anyOf": [
|
||||||
|
{
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "null"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"title": "Message"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"type": "object",
|
||||||
|
"required": [
|
||||||
|
"success",
|
||||||
|
"filename",
|
||||||
|
"file_type",
|
||||||
|
"original_filename",
|
||||||
|
"task_description"
|
||||||
|
],
|
||||||
|
"title": "UploadResponse",
|
||||||
|
"description": "上传响应模型"
|
||||||
|
},
|
||||||
|
"ValidationError": {
|
||||||
|
"properties": {
|
||||||
|
"loc": {
|
||||||
|
"items": {
|
||||||
|
"anyOf": [
|
||||||
|
{
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "integer"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"type": "array",
|
||||||
|
"title": "Location"
|
||||||
|
},
|
||||||
|
"msg": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Message"
|
||||||
|
},
|
||||||
|
"type": {
|
||||||
|
"type": "string",
|
||||||
|
"title": "Error Type"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"type": "object",
|
||||||
|
"required": [
|
||||||
|
"loc",
|
||||||
|
"msg",
|
||||||
|
"type"
|
||||||
|
],
|
||||||
|
"title": "ValidationError"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
100
pyproject.toml
Normal file
100
pyproject.toml
Normal file
@ -0,0 +1,100 @@
|
|||||||
|
[project]
|
||||||
|
name = "lazy-fjh"
|
||||||
|
version = "2.0.0"
|
||||||
|
description = "时间序列数据分析系统 - FastAPI 版本"
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.10"
|
||||||
|
authors = [{ name = "Your Name", email = "your.email@example.com" }]
|
||||||
|
keywords = ["time-series", "data-analysis", "fastapi", "statistical-analysis"]
|
||||||
|
classifiers = [
|
||||||
|
"Development Status :: 4 - Beta",
|
||||||
|
"Intended Audience :: Science/Research",
|
||||||
|
"License :: OSI Approved :: MIT License",
|
||||||
|
"Programming Language :: Python :: 3",
|
||||||
|
"Programming Language :: Python :: 3.10",
|
||||||
|
"Programming Language :: Python :: 3.11",
|
||||||
|
"Programming Language :: Python :: 3.12",
|
||||||
|
"Operating System :: OS Independent",
|
||||||
|
]
|
||||||
|
|
||||||
|
dependencies = [
|
||||||
|
# FastAPI 和 Web 框架
|
||||||
|
"fastapi>=0.104.1",
|
||||||
|
"uvicorn[standard]>=0.24.0",
|
||||||
|
"python-multipart>=0.0.6",
|
||||||
|
"python-dotenv>=1.0.0",
|
||||||
|
|
||||||
|
# 数据处理
|
||||||
|
"pandas>=2.2.2",
|
||||||
|
"numpy>=1.26.4",
|
||||||
|
|
||||||
|
# 统计和科学计算
|
||||||
|
"scipy>=1.13.0",
|
||||||
|
"scikit-learn>=1.3.0",
|
||||||
|
"statsmodels>=0.14.0",
|
||||||
|
|
||||||
|
# 可视化
|
||||||
|
"matplotlib>=3.7.2",
|
||||||
|
"seaborn>=0.12.2",
|
||||||
|
|
||||||
|
# 报告生成
|
||||||
|
"reportlab>=4.0.4",
|
||||||
|
"python-docx>=0.8.11",
|
||||||
|
"python-pptx>=0.6.21",
|
||||||
|
|
||||||
|
# API 和数据
|
||||||
|
"openai>=1.3.0",
|
||||||
|
"gradio_client>=0.9.0",
|
||||||
|
"beautifulsoup4>=4.12.2",
|
||||||
|
"requests>=2.31.0",
|
||||||
|
|
||||||
|
# 系统和图像
|
||||||
|
"psutil>=5.9.5",
|
||||||
|
"Pillow>=10.0.0",
|
||||||
|
"opencv-python>=4.8.1.78",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
dev = [
|
||||||
|
"pytest>=7.4.0",
|
||||||
|
"pytest-cov>=4.1.0",
|
||||||
|
"black>=23.0.0",
|
||||||
|
"ruff>=0.1.0",
|
||||||
|
"mypy>=1.0.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
prod = ["gunicorn>=21.2.0", "supervisor>=4.2.5"]
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["flit_core >=3.2,<4"]
|
||||||
|
build-backend = "flit_core.buildapi"
|
||||||
|
|
||||||
|
[tool.uv]
|
||||||
|
dev-dependencies = [
|
||||||
|
"pytest>=7.4.0",
|
||||||
|
"pytest-cov>=4.1.0",
|
||||||
|
"black>=23.0.0",
|
||||||
|
"ruff>=0.1.0",
|
||||||
|
]
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["tests"]
|
||||||
|
python_files = "test_*.py"
|
||||||
|
python_classes = "Test*"
|
||||||
|
python_functions = "test_*"
|
||||||
|
|
||||||
|
[tool.black]
|
||||||
|
line-length = 100
|
||||||
|
target-version = ["py310", "py311", "py312"]
|
||||||
|
|
||||||
|
[tool.ruff]
|
||||||
|
line-length = 100
|
||||||
|
target-version = "py310"
|
||||||
|
select = ["E", "F", "W", "I"]
|
||||||
|
ignore = ["E501"]
|
||||||
|
|
||||||
|
[tool.mypy]
|
||||||
|
python_version = "3.10"
|
||||||
|
warn_return_any = true
|
||||||
|
warn_unused_configs = true
|
||||||
|
disallow_untyped_defs = false
|
||||||
29
requirements.txt
Normal file
29
requirements.txt
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
# FastAPI 和 Web 框架
|
||||||
|
fastapi==0.104.1
|
||||||
|
uvicorn[standard]==0.24.0
|
||||||
|
python-multipart==0.0.6
|
||||||
|
python-dotenv==1.0.0
|
||||||
|
|
||||||
|
# 数据处理
|
||||||
|
pandas==2.2.2
|
||||||
|
numpy==1.26.4
|
||||||
|
|
||||||
|
# 统计和科学计算
|
||||||
|
scipy==1.13.0
|
||||||
|
scikit-learn==1.3.0
|
||||||
|
statsmodels==0.14.0
|
||||||
|
|
||||||
|
# 可视化
|
||||||
|
matplotlib==3.7.2
|
||||||
|
seaborn==0.12.2
|
||||||
|
|
||||||
|
# API 和数据
|
||||||
|
openai==1.3.0
|
||||||
|
gradio_client>=0.9.0
|
||||||
|
requests==2.31.0
|
||||||
|
|
||||||
|
# 系统和图像
|
||||||
|
psutil==5.9.5
|
||||||
|
|
||||||
|
# 生产部署
|
||||||
|
gunicorn==21.2.0
|
||||||
96
resource/fonts/LICENSE.txt
Normal file
96
resource/fonts/LICENSE.txt
Normal file
@ -0,0 +1,96 @@
|
|||||||
|
Copyright 2014-2021 Adobe (http://www.adobe.com/), with Reserved Font
|
||||||
|
Name 'Source'. Source is a trademark of Adobe in the United States
|
||||||
|
and/or other countries.
|
||||||
|
|
||||||
|
This Font Software is licensed under the SIL Open Font License,
|
||||||
|
Version 1.1.
|
||||||
|
|
||||||
|
This license is copied below, and is also available with a FAQ at:
|
||||||
|
http://scripts.sil.org/OFL
|
||||||
|
|
||||||
|
-----------------------------------------------------------
|
||||||
|
SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
|
||||||
|
-----------------------------------------------------------
|
||||||
|
|
||||||
|
PREAMBLE
|
||||||
|
The goals of the Open Font License (OFL) are to stimulate worldwide
|
||||||
|
development of collaborative font projects, to support the font
|
||||||
|
creation efforts of academic and linguistic communities, and to
|
||||||
|
provide a free and open framework in which fonts may be shared and
|
||||||
|
improved in partnership with others.
|
||||||
|
|
||||||
|
The OFL allows the licensed fonts to be used, studied, modified and
|
||||||
|
redistributed freely as long as they are not sold by themselves. The
|
||||||
|
fonts, including any derivative works, can be bundled, embedded,
|
||||||
|
redistributed and/or sold with any software provided that any reserved
|
||||||
|
names are not used by derivative works. The fonts and derivatives,
|
||||||
|
however, cannot be released under any other type of license. The
|
||||||
|
requirement for fonts to remain under this license does not apply to
|
||||||
|
any document created using the fonts or their derivatives.
|
||||||
|
|
||||||
|
DEFINITIONS
|
||||||
|
"Font Software" refers to the set of files released by the Copyright
|
||||||
|
Holder(s) under this license and clearly marked as such. This may
|
||||||
|
include source files, build scripts and documentation.
|
||||||
|
|
||||||
|
"Reserved Font Name" refers to any names specified as such after the
|
||||||
|
copyright statement(s).
|
||||||
|
|
||||||
|
"Original Version" refers to the collection of Font Software
|
||||||
|
components as distributed by the Copyright Holder(s).
|
||||||
|
|
||||||
|
"Modified Version" refers to any derivative made by adding to,
|
||||||
|
deleting, or substituting -- in part or in whole -- any of the
|
||||||
|
components of the Original Version, by changing formats or by porting
|
||||||
|
the Font Software to a new environment.
|
||||||
|
|
||||||
|
"Author" refers to any designer, engineer, programmer, technical
|
||||||
|
writer or other person who contributed to the Font Software.
|
||||||
|
|
||||||
|
PERMISSION & CONDITIONS
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining
|
||||||
|
a copy of the Font Software, to use, study, copy, merge, embed,
|
||||||
|
modify, redistribute, and sell modified and unmodified copies of the
|
||||||
|
Font Software, subject to the following conditions:
|
||||||
|
|
||||||
|
1) Neither the Font Software nor any of its individual components, in
|
||||||
|
Original or Modified Versions, may be sold by itself.
|
||||||
|
|
||||||
|
2) Original or Modified Versions of the Font Software may be bundled,
|
||||||
|
redistributed and/or sold with any software, provided that each copy
|
||||||
|
contains the above copyright notice and this license. These can be
|
||||||
|
included either as stand-alone text files, human-readable headers or
|
||||||
|
in the appropriate machine-readable metadata fields within text or
|
||||||
|
binary files as long as those fields can be easily viewed by the user.
|
||||||
|
|
||||||
|
3) No Modified Version of the Font Software may use the Reserved Font
|
||||||
|
Name(s) unless explicit written permission is granted by the
|
||||||
|
corresponding Copyright Holder. This restriction only applies to the
|
||||||
|
primary font name as presented to the users.
|
||||||
|
|
||||||
|
4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
|
||||||
|
Software shall not be used to promote, endorse or advertise any
|
||||||
|
Modified Version, except to acknowledge the contribution(s) of the
|
||||||
|
Copyright Holder(s) and the Author(s) or with their explicit written
|
||||||
|
permission.
|
||||||
|
|
||||||
|
5) The Font Software, modified or unmodified, in part or in whole,
|
||||||
|
must be distributed entirely under this license, and must not be
|
||||||
|
distributed under any other license. The requirement for fonts to
|
||||||
|
remain under this license does not apply to any document created using
|
||||||
|
the Font Software.
|
||||||
|
|
||||||
|
TERMINATION
|
||||||
|
This license becomes null and void if any of the above conditions are
|
||||||
|
not met.
|
||||||
|
|
||||||
|
DISCLAIMER
|
||||||
|
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||||
|
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
|
||||||
|
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
|
||||||
|
OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
|
||||||
|
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||||
|
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
|
||||||
|
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||||
|
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
|
||||||
|
OTHER DEALINGS IN THE FONT SOFTWARE.
|
||||||
BIN
resource/fonts/SourceHanSansCN.zip
Normal file
BIN
resource/fonts/SourceHanSansCN.zip
Normal file
Binary file not shown.
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Bold.otf
Normal file
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Bold.otf
Normal file
Binary file not shown.
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-ExtraLight.otf
Normal file
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-ExtraLight.otf
Normal file
Binary file not shown.
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Heavy.otf
Normal file
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Heavy.otf
Normal file
Binary file not shown.
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Light.otf
Normal file
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Light.otf
Normal file
Binary file not shown.
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Medium.otf
Normal file
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Medium.otf
Normal file
Binary file not shown.
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Normal.otf
Normal file
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Normal.otf
Normal file
Binary file not shown.
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Regular.otf
Normal file
BIN
resource/fonts/SubsetOTF/CN/SourceHanSansCN-Regular.otf
Normal file
Binary file not shown.
126
run.sh
Normal file
126
run.sh
Normal file
@ -0,0 +1,126 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# FastAPI 应用启动脚本 (使用 uv 包管理)
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
echo "========================================"
|
||||||
|
echo "启动 FastAPI 时间序列分析系统 v2.0"
|
||||||
|
echo "========================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 检查 uv
|
||||||
|
if ! command -v /home/syy/.local/bin/uv &> /dev/null; then
|
||||||
|
echo "错误: 未找到 uv"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "✓ uv 已安装"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 检查虚拟环境,如果不存在则创建
|
||||||
|
if [ ! -d ".venv" ]; then
|
||||||
|
echo "创建虚拟环境..."
|
||||||
|
/home/syy/.local/bin/uv venv --python 3.10
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 激活虚拟环境
|
||||||
|
echo "激活虚拟环境..."
|
||||||
|
source .venv/bin/activate
|
||||||
|
|
||||||
|
# 加载 .env(不覆盖已存在的环境变量)
|
||||||
|
if [ -f ".env" ]; then
|
||||||
|
echo "加载 .env..."
|
||||||
|
while IFS=$'\t' read -r key quoted_value; do
|
||||||
|
[ -z "$key" ] && continue
|
||||||
|
# 仅允许合法的环境变量名
|
||||||
|
if [[ ! "$key" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
# 已在环境中显式设置的变量优先
|
||||||
|
if [ -z "${!key+x}" ]; then
|
||||||
|
eval "export ${key}=${quoted_value}"
|
||||||
|
fi
|
||||||
|
done < <(python - <<'PY'
|
||||||
|
import os
|
||||||
|
import shlex
|
||||||
|
|
||||||
|
try:
|
||||||
|
from dotenv import dotenv_values
|
||||||
|
except Exception:
|
||||||
|
dotenv_values = None
|
||||||
|
|
||||||
|
if dotenv_values is None:
|
||||||
|
raise SystemExit(0)
|
||||||
|
|
||||||
|
values = dotenv_values('.env')
|
||||||
|
for k, v in values.items():
|
||||||
|
if k is None or v is None:
|
||||||
|
continue
|
||||||
|
# 输出:KEY<TAB>shell_quoted_value
|
||||||
|
print(f"{k}\t{shlex.quote(str(v))}")
|
||||||
|
PY
|
||||||
|
)
|
||||||
|
echo "✓ .env 加载完成"
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 检查并安装依赖
|
||||||
|
echo "检查依赖..."
|
||||||
|
python -c "import fastapi; import uvicorn; import pandas; import numpy" 2>/dev/null || {
|
||||||
|
echo "安装依赖..."
|
||||||
|
/home/syy/.local/bin/uv pip install \
|
||||||
|
'fastapi>=0.104.1' \
|
||||||
|
'uvicorn[standard]>=0.24.0' \
|
||||||
|
'python-multipart>=0.0.6' \
|
||||||
|
'python-dotenv>=1.0.0' \
|
||||||
|
'pandas>=2.2.2' \
|
||||||
|
'numpy>=1.26.4' \
|
||||||
|
'scipy>=1.13.0' \
|
||||||
|
'scikit-learn>=1.3.0' \
|
||||||
|
'statsmodels>=0.14.0' \
|
||||||
|
'matplotlib>=3.7.2' \
|
||||||
|
'seaborn>=0.12.2' \
|
||||||
|
'openai>=1.3.0' \
|
||||||
|
'gradio_client>=0.9.0' \
|
||||||
|
'requests>=2.31.0' \
|
||||||
|
'psutil>=5.9.5'
|
||||||
|
}
|
||||||
|
|
||||||
|
echo "✓ 依赖检查完成"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 创建必要的目录
|
||||||
|
mkdir -p uploads logs temp resource/fonts
|
||||||
|
|
||||||
|
# 设置环境变量(如果没有设置)
|
||||||
|
export ENV=${ENV:-"development"}
|
||||||
|
export DEBUG=${DEBUG:-"False"}
|
||||||
|
export HOST=${HOST:-"0.0.0.0"}
|
||||||
|
export PORT=${PORT:-"60201"}
|
||||||
|
export LOG_LEVEL=${LOG_LEVEL:-"INFO"}
|
||||||
|
|
||||||
|
echo "环境配置:"
|
||||||
|
echo " ENV=$ENV"
|
||||||
|
echo " DEBUG=$DEBUG"
|
||||||
|
echo " HOST=$HOST"
|
||||||
|
echo " PORT=$PORT"
|
||||||
|
echo " LOG_LEVEL=$LOG_LEVEL"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 启动应用
|
||||||
|
echo "启动应用..."
|
||||||
|
echo ""
|
||||||
|
echo "=================================="
|
||||||
|
echo "✓ 访问地址: http://localhost:$PORT"
|
||||||
|
echo "✓ API 文档: http://localhost:$PORT/docs"
|
||||||
|
echo "✓ ReDoc: http://localhost:$PORT/redoc"
|
||||||
|
echo "=================================="
|
||||||
|
echo ""
|
||||||
|
echo "按 Ctrl+C 停止应用"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# 使用 uvicorn 运行
|
||||||
|
python -m uvicorn app.main:app \
|
||||||
|
--host $HOST \
|
||||||
|
--port $PORT \
|
||||||
|
--log-level $(echo $LOG_LEVEL | tr '[:upper:]' '[:lower:]')
|
||||||
158
run_analysis_on_test_data.py
Normal file
158
run_analysis_on_test_data.py
Normal file
@ -0,0 +1,158 @@
|
|||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import shutil
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# Add project root to path
|
||||||
|
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
|
||||||
|
from app.services.analysis_system import TimeSeriesAnalysisSystem
|
||||||
|
from app.core.config import settings
|
||||||
|
|
||||||
|
class NpEncoder(json.JSONEncoder):
|
||||||
|
"""
|
||||||
|
JSON encoder that handles NumPy types
|
||||||
|
"""
|
||||||
|
def default(self, obj):
|
||||||
|
if isinstance(obj, np.integer):
|
||||||
|
return int(obj)
|
||||||
|
if isinstance(obj, np.floating):
|
||||||
|
return float(obj)
|
||||||
|
if isinstance(obj, (np.bool_, bool)):
|
||||||
|
return bool(obj)
|
||||||
|
if isinstance(obj, np.ndarray):
|
||||||
|
return obj.tolist()
|
||||||
|
if isinstance(obj, pd.Timestamp):
|
||||||
|
return str(obj)
|
||||||
|
return super(NpEncoder, self).default(obj)
|
||||||
|
|
||||||
|
def format_details(details):
|
||||||
|
"""
|
||||||
|
Format detailed results for text output
|
||||||
|
"""
|
||||||
|
if details is None:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
# Handle pandas Series/DataFrame
|
||||||
|
if isinstance(details, (pd.DataFrame, pd.Series)):
|
||||||
|
try:
|
||||||
|
return details.to_markdown() if hasattr(details, 'to_markdown') else details.to_string()
|
||||||
|
except ImportError:
|
||||||
|
return details.to_string()
|
||||||
|
|
||||||
|
# Handle Dict/List (JSON-like)
|
||||||
|
if isinstance(details, (dict, list)):
|
||||||
|
try:
|
||||||
|
return json.dumps(details, cls=NpEncoder, indent=2, ensure_ascii=False)
|
||||||
|
except Exception as e:
|
||||||
|
return f"JSON Serialization Error: {e}\nRaw: {str(details)}"
|
||||||
|
|
||||||
|
return str(details)
|
||||||
|
|
||||||
|
def run_all_analyses():
|
||||||
|
# Setup paths
|
||||||
|
base_dir = Path(__file__).parent
|
||||||
|
test_dir = base_dir / "test"
|
||||||
|
csv_filename = "comprehensive_test_data.csv"
|
||||||
|
csv_path = test_dir / csv_filename
|
||||||
|
|
||||||
|
if not csv_path.exists():
|
||||||
|
print(f"Error: Test file not found at {csv_path}")
|
||||||
|
return
|
||||||
|
|
||||||
|
output_dir = test_dir / "results"
|
||||||
|
output_dir.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
print(f"Starting analysis on {csv_path}")
|
||||||
|
print(f"Results will be saved to {output_dir}")
|
||||||
|
|
||||||
|
# Initialize System
|
||||||
|
# generate_plots=False allows skipping image generation but still returns full data details
|
||||||
|
system = TimeSeriesAnalysisSystem(
|
||||||
|
str(csv_path),
|
||||||
|
task_description="Test Suite Analysis",
|
||||||
|
language="zh",
|
||||||
|
generate_plots=False
|
||||||
|
)
|
||||||
|
|
||||||
|
if not system.load_and_preprocess_data():
|
||||||
|
print("Failed to load data")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Define methods to run
|
||||||
|
methods = [
|
||||||
|
('statistical_overview', system.generate_statistical_overview),
|
||||||
|
('time_series_analysis', system.generate_time_series_plots),
|
||||||
|
('acf_pacf_analysis', system.generate_acf_pacf_plots),
|
||||||
|
('stationarity_tests', system.perform_stationarity_tests),
|
||||||
|
('normality_tests', system.perform_normality_tests),
|
||||||
|
('seasonal_decomposition', system.perform_seasonal_decomposition),
|
||||||
|
('spectral_analysis', system.perform_spectral_analysis),
|
||||||
|
('correlation_analysis', system.generate_correlation_heatmap),
|
||||||
|
('pca_scree_plot', system.generate_pca_scree_plot),
|
||||||
|
('pca_analysis', system.perform_pca_analysis),
|
||||||
|
('feature_importance', system.analyze_feature_importance),
|
||||||
|
('clustering_analysis', system.perform_clustering_analysis),
|
||||||
|
('factor_analysis', system.perform_factor_analysis),
|
||||||
|
('cointegration_test', system.perform_cointegration_test),
|
||||||
|
('var_analysis', system.perform_var_analysis)
|
||||||
|
]
|
||||||
|
|
||||||
|
for name, method in methods:
|
||||||
|
print(f"\nrunning {name}...")
|
||||||
|
try:
|
||||||
|
result = method()
|
||||||
|
|
||||||
|
img_path = None
|
||||||
|
summary = ""
|
||||||
|
details = None
|
||||||
|
|
||||||
|
# Parse result
|
||||||
|
if isinstance(result, tuple):
|
||||||
|
if len(result) == 3:
|
||||||
|
img_path, summary, details = result
|
||||||
|
elif len(result) == 2:
|
||||||
|
img_path, summary = result
|
||||||
|
else:
|
||||||
|
summary = str(result)
|
||||||
|
|
||||||
|
# Save Output
|
||||||
|
base_output_name = f"{name}_output"
|
||||||
|
|
||||||
|
# 1. Save Summary & Details
|
||||||
|
txt_path = output_dir / f"{base_output_name}.txt"
|
||||||
|
with open(txt_path, "w", encoding="utf-8") as f:
|
||||||
|
f.write(f"Method: {name}\n")
|
||||||
|
f.write("-" * 50 + "\n")
|
||||||
|
f.write("Summary:\n")
|
||||||
|
f.write(str(summary))
|
||||||
|
f.write("\n\n")
|
||||||
|
|
||||||
|
if details is not None:
|
||||||
|
f.write("Detailed Results:\n")
|
||||||
|
f.write("-" * 50 + "\n")
|
||||||
|
formatted_details = format_details(details)
|
||||||
|
f.write(formatted_details)
|
||||||
|
f.write("\n")
|
||||||
|
|
||||||
|
print(f" Saved full details to {txt_path.name}")
|
||||||
|
|
||||||
|
# 2. Save Image (if any)
|
||||||
|
if img_path and os.path.exists(img_path):
|
||||||
|
ext = os.path.splitext(img_path)[1]
|
||||||
|
target_img_path = output_dir / f"{base_output_name}{ext}"
|
||||||
|
shutil.copy2(img_path, target_img_path)
|
||||||
|
print(f" Saved image to {target_img_path.name}")
|
||||||
|
else:
|
||||||
|
pass # No image expected if generate_plots=False
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Error running {name}: {e}")
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
run_all_analyses()
|
||||||
Loading…
Reference in New Issue
Block a user