Jsonbackend

This commit is contained in:
iceneon 2026-01-29 18:18:32 +08:00
commit dd1087ad23
53 changed files with 8855 additions and 0 deletions

47
.env.example Normal file
View File

@ -0,0 +1,47 @@
# 环境配置模板
# 复制此文件为 .env 并填入实际值
# 环境
ENV=development
DEBUG=False
# 服务器配置
HOST=0.0.0.0
PORT=60201
# CORS 配置 (逗号分隔)
CORS_ORIGINS=*
# API 暴露模式
# full: 暴露 v1 + v2默认
# v2: 仅暴露 /api/v2 分析接口 + 基础状态接口(禁用 v1 上传/文件/图片接口)
API_MODE=full
# 文件上传
UPLOAD_DIR=uploads
MAX_UPLOAD_SIZE=16777216 # 16MB (字节)
TEMP_DIR=temp
# 字体配置
FONTS_DIR=resource/fonts
# API 配置 (阿里云千问)
MY_API_KEY=sk-your-api-key-here
MY_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
MY_MODEL=qwen-turbo
# 分析配置
LANGUAGE_DEFAULT=zh
ANALYSIS_TIMEOUT=300
MAX_MEMORY_MB=500
# v2 (OSS URL) 安全配置
# V2_ALLOWED_HOSTS=oss.example.com,oss-cn-hangzhou.aliyuncs.com
# V2_ALLOW_HTTP=False
# V2_ALLOW_PRIVATE_NETWORKS=False
# V2_CONNECT_TIMEOUT_SECONDS=5
# V2_DOWNLOAD_TIMEOUT_SECONDS=30
# 日志
LOG_LEVEL=INFO
LOG_DIR=logs

22
.gitignore vendored Normal file
View File

@ -0,0 +1,22 @@
**/.DS_Store
.venv/
**/__pycache__/
**/*.pyc
**/*.pyo
**/*.pyd
.vscode/
.idea/
**/*.swp
.env
uploads/
logs/
# generated artifacts
test/results/
*.log
temp/
test/

26
1.md Normal file
View File

@ -0,0 +1,26 @@
是的,完全正确。
简单总结就是 “三步走”:
1. 进目录
打开 WSL 终端,进入项目文件夹:
Bash
cd /mnt/h/vs_code/Python-Server
2. 激活环境
让终端进入 Python 虚拟环境(看到前面有 (.venv) 就算成功):
Bash
source .venv/bin/activate
3. 跑起来
启动服务(记得加 --host 0.0.0.0 方便 Windows 访问):
Bash
uvicorn app.main:app --host 0.0.0.0 --port 60201 --reload
然后就可以去浏览器访问 http://localhost:60201/docs 了。祝开发顺利!

271
DEPLOYMENT.md Normal file
View File

@ -0,0 +1,271 @@
# FastAPI 应用生产部署说明
## 快速开始
### 1. 环境要求
- Python 3.10+
- Linux / macOS / Windows
- 20GB 磁盘空间(用于字体和数据)
### 2. 一键安装和启动
```bash
# 首次运行,会自动创建虚拟环境和安装依赖
bash run.sh
```
### 3. Docker 部署
```bash
# 构建 Docker 镜像
docker build -t lazy-fjh:latest .
# 运行容器
docker run -d \
-p 60201:60201 \
-v $(pwd)/uploads:/opt/lazy_fjh/uploads \
-v $(pwd)/logs:/opt/lazy_fjh/logs \
-e MY_API_KEY=sk-your-key \
lazy-fjh:latest
```
### 4. Systemd 部署 (Linux)
```bash
# 复制应用到系统目录
sudo cp -r /path/to/lazy_fjh /opt/
# 更新权限
sudo chown -R www-data:www-data /opt/lazy_fjh
# 安装 systemd 服务
sudo cp /opt/lazy_fjh/deploy/systemd/lazy-fjh.service /etc/systemd/system/
# 启用并启动服务
sudo systemctl daemon-reload
sudo systemctl enable lazy-fjh
sudo systemctl start lazy-fjh
# 检查状态
sudo systemctl status lazy-fjh
```
### 5. Gunicorn 部署
```bash
# 激活虚拟环境
source .venv/bin/activate
# 使用 gunicorn 启动
gunicorn -c deploy/gunicorn_config.py main:app
```
## 字体配置
### Linux 用户
首先安装系统字体:
```bash
bash deploy/install_fonts.sh
```
或手动安装:
```bash
# Ubuntu/Debian
sudo apt-get install -y fonts-wqy-microhei fonts-noto-cjk-extra
# CentOS/RHEL
sudo yum install -y wqy-microhei
# Arch Linux
sudo pacman -S --noconfirm wqy-microhei ttf-noto-sans-cjk
```
### macOS 用户
```bash
brew install --cask font-noto-sans-cjk
```
### Windows 用户
从 https://www.noto-fonts.cn 下载 Noto Sans CJK 并安装
## 环境变量配置
复制 `.env.example``.env` 并填入实际值:
```bash
cp .env.example .env
```
编辑 `.env` 文件:
```env
# 环境
ENV=production
DEBUG=False
# 服务器
HOST=0.0.0.0
PORT=60201
# API 密钥 (阿里云千问)
MY_API_KEY=sk-your-api-key-here
MY_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
MY_MODEL=qwen-turbo
```
## 文件存储
### 上传目录
- **默认**: `./uploads/`
- **配置**: 设置 `UPLOAD_DIR` 环境变量
### 日志目录
- **默认**: `./logs/`
- **配置**: 设置 `LOG_DIR` 环境变量
## API 文档
启动应用后访问:
- **Swagger UI**: http://localhost:60201/docs
- **ReDoc**: http://localhost:60201/redoc
- **OpenAPI**: http://localhost:60201/openapi.json
## 常见问题
### 1. 字体显示为方块
**原因**: 系统未安装中文字体
**解决**:
```bash
bash deploy/install_fonts.sh
```
### 2. 内存占用过高
**原因**: 处理大型数据集时内存使用增多
**解决**:
- 调整 `MAX_MEMORY_MB` 环境变量
- 分批处理数据
- 增加服务器内存
### 3. 上传文件超时
**原因**: 文件过大或网络问题
**解决**:
- 检查 `MAX_UPLOAD_SIZE` 限制
- 增加 `ANALYSIS_TIMEOUT`
- 分割大文件
### 4. 无法访问 API
**原因**: 防火墙或端口被占用
**解决**:
```bash
# 检查端口占用
sudo lsof -i :60201
# 更改 PORT 环境变量
export PORT=8080
bash run.sh
```
## 监控和维护
### 查看日志
```bash
# 实时日志
tail -f logs/app.log
# 访问日志 (Gunicorn)
tail -f logs/access.log
```
### 系统资源监控
```bash
# 使用 top/htop 监控
htop
# 或在 Python 中
python -c "from modules.linux_adapter import LinuxAdapter; print(LinuxAdapter.get_process_info())"
```
### 定期清理
```bash
# 清理临时文件(超过 7 天)
find ./temp -type f -mtime +7 -delete
# 清理旧上传文件(超过 30 天)
find ./uploads -type f -mtime +30 -delete
```
## 性能优化
### 1. 启用 Gzip 压缩
已默认启用,减少响应体积
### 2. 异步处理
使用异步 I/O支持更多并发连接
### 3. 内存管理
自动监控和清理内存
### 4. 并发配置 (Gunicorn)
```
workers = cpu_count * 2 + 1
worker_connections = 1000
```
## 备份和恢复
### 备份上传的文件
```bash
tar -czf backup-uploads-$(date +%Y%m%d).tar.gz uploads/
```
### 备份数据库 (如果使用)
```bash
# PostgreSQL
pg_dump -U user db_name > backup.sql
```
## 更新应用
```bash
# 拉取最新代码
git pull origin main
# 重新安装依赖(如有更新)
/home/syy/.local/bin/uv pip install --upgrade -r requirements.txt
# 重启服务
systemctl restart lazy-fjh
```
## 安全建议
1. **API 密钥**: 不要在代码中硬编码,使用环境变量
2. **HTTPS**: 在生产环境使用 HTTPS配置 SSL 证书
3. **CORS**: 根据需要限制 CORS 源
4. **速率限制**: 考虑添加 API 速率限制
5. **认证**: 为敏感端点添加身份验证
## 支持和反馈
如有问题或建议,请提交 issue 或联系技术支持。

35
Dockerfile Normal file
View File

@ -0,0 +1,35 @@
FROM python:3.11-slim
# 安装系统依赖
RUN apt-get update && apt-get install -y \
fonts-wqy-microhei \
fonts-noto-cjk \
fonts-liberation \
fonts-dejavu \
libgomp1 \
libsm6 \
libxext6 \
libxrender-dev \
&& rm -rf /var/lib/apt/lists/*
# 设置工作目录
WORKDIR /app
# 复制项目文件
COPY . .
# 安装 Python 依赖
RUN pip install --no-cache-dir -r requirements.txt
# 创建必要的目录
RUN mkdir -p uploads logs temp resource/fonts
# 暴露端口
EXPOSE 60201
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:60201/health || exit 1
# 启动命令
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "60201"]

127
README.md Normal file
View File

@ -0,0 +1,127 @@
# Lazy StatFastAPI 时间序列分析后端)
基于 FastAPI 的时间序列数据分析服务:上传 CSV → 运行多种统计/时序/多变量分析 → 返回结构化结果(包含 `steps[]` 明细,便于前端渲染与调试)。
## 功能概览
- 15+ 分析步骤统计概览、时间序列分析、ACF/PACF、平稳性、正态性、季节分解、频谱、相关性、PCA、聚类、因子分析、协整检验、VAR 等
- 统一输出结构:每一步包含 `summary` + `data/columns`(或 dict 结果),且保证 JSON 可序列化
- 可选绘图:通过 `generate_plots` 控制是否生成图片,并通过文件接口访问
## 快速开始(本地)
一键启动(使用 `uv` 管理虚拟环境/依赖):
```bash
bash run.sh
```
启动后访问:
- Swagger: `http://localhost:60201/docs`
- ReDoc: `http://localhost:60201/redoc`
- Health: `http://localhost:60201/health`
服务入口为 `app.main:app`(见 [app/main.py](app/main.py))。
## Docker / Compose
使用 Compose
```bash
docker compose up --build
```
Compose 配置见 [docker-compose.yml](docker-compose.yml)。
## 环境变量
示例文件见 [.env.example](.env.example)。常用变量:
- `HOST` / `PORT`:监听地址与端口(默认 `0.0.0.0:60201`
- `ENV` / `DEBUG`:运行环境
- `MAX_MEMORY_MB`:内存阈值(超过会触发 gc
- `ANALYSIS_TIMEOUT`:分析超时(如有)
- `MY_API_KEY`:外部大模型 API Key
开发/冒烟测试如果不希望调用外部大模型,可设置:
```bash
export MY_API_KEY=simulation-mode
```
如果希望仅开放 v2OSS URL分析接口、禁用 v1 上传/文件/图片接口,可设置:
```bash
export API_MODE=v2
```
## API 使用
所有 API 都挂在 `/api` 前缀下。
### 1) 上传 CSV
`POST /api/upload`(当前实现仅支持 CSV
```bash
curl -F "file=@test/comprehensive_test_data.csv" \
-F "task_description=demo" \
http://localhost:60201/api/upload
```
返回会给出 `filename`(服务端保存后的文件名),后续分析时使用它。
### 2) 运行分析
`POST /api/analyze`
```bash
curl -H "Content-Type: application/json" \
-d '{
"filename": "<upload 返回的 filename>",
"task_description": "demo",
"language": "zh",
"generate_plots": false
}' \
http://localhost:60201/api/analyze
```
响应结构要点:
- `meta`: 文件名、语言、是否绘图、创建时间等
- `analysis.<lang>.steps[]`: 每个分析步骤的结构化结果(`key/title/summary/data/columns/api_analysis` 等)
- `images`: 当 `generate_plots=true` 时包含图片文件名;可用 `GET /api/image/{filename}` 获取
### 2.1) v2从 OSS URL 分析(推荐)
`POST /api/v2/analyze`:传入 `oss_url`,后端会下载到临时文件分析并返回结构化 `steps[]`;默认不产图(你也可以传 `generate_plots=true` 以保持与 v1 同能力)。
```bash
curl -H "Content-Type: application/json" \
-d '{
"oss_url": "https://<your-oss-presigned-url>",
"task_description": "demo",
"language": "zh",
"generate_plots": false
}' \
http://localhost:60201/api/v2/analyze
```
### 3) 其他接口
- `GET /api/available_methods`:列出可用分析方法
- `GET /api/list_uploads`:列出 uploads 文件
- `GET /api/download/{filename}`:下载文件
## 生成“完整文本输出”(用于调试/验收)
脚本 [run_analysis_on_test_data.py](run_analysis_on_test_data.py) 会对测试数据跑完整流程,并把每一步的 `summary + details` 输出到 `test/results/*.txt`,适合检查 p 值、数组、DataFrame 等完整信息:
```bash
python3 run_analysis_on_test_data.py
```
## 部署
生产部署说明见 [DEPLOYMENT.md](DEPLOYMENT.md)。

0
app/__init__.py Normal file
View File

0
app/api/__init__.py Normal file
View File

View File

@ -0,0 +1,5 @@
"""
路由包初始化
"""
__all__ = ['upload', 'analysis', 'analysis_v2', 'files']

189
app/api/routes/analysis.py Normal file
View File

@ -0,0 +1,189 @@
"""
分析路由
"""
import logging
import json
from datetime import datetime
from typing import Optional, Dict, Any, List
from fastapi import APIRouter, HTTPException, status, BackgroundTasks
from pydantic import BaseModel
import psutil
import os
import gc
import shutil
from app.core.config import settings
from app.services.analysis import TimeSeriesAnalysisSystem
logger = logging.getLogger(__name__)
router = APIRouter()
class AnalysisRequest(BaseModel):
"""分析请求模型"""
filename: str
file_type: str = "csv"
task_description: str = "时间序列数据分析"
data_background: Dict[str, Any] = {}
original_image: Optional[str] = None
language: str = "zh"
generate_plots: bool = False
@router.get("/available_methods", summary="获取可用的分析方法")
async def get_available_methods() -> dict:
"""获取所有可用的分析方法"""
return {
"success": True,
"methods": {
'statistical_overview': {'name': '统计概览', 'description': '生成数据的基本统计信息和分布图表'},
'time_series_analysis': {'name': '时间序列分析', 'description': '分析变量随时间变化的趋势和模式'},
'acf_pacf_analysis': {'name': '自相关分析', 'description': '生成自相关和偏自相关函数图'},
'stationarity_tests': {'name': '平稳性检验', 'description': '执行ADF、KPSS等平稳性检验'},
'normality_tests': {'name': '正态性检验', 'description': '执行Shapiro-Wilk、Jarque-Bera正态性检验'},
'seasonal_decomposition': {'name': '季节性分解', 'description': '分解时间序列的趋势、季节和残差成分'},
'spectral_analysis': {'name': '频谱分析', 'description': '分析时间序列的频域特征'},
'correlation_analysis': {'name': '相关性分析', 'description': '计算变量间的相关性并生成热力图'},
'pca_scree_plot': {'name': 'PCA碎石图', 'description': '显示主成分分析的解释方差'},
'pca_analysis': {'name': '主成分分析', 'description': '降维分析,识别数据的主要变化方向'},
'feature_importance': {'name': '特征重要性', 'description': '分析各变量对目标预测的重要性'},
'clustering_analysis': {'name': '聚类分析', 'description': '将数据点分组为具有相似特征的簇'},
'factor_analysis': {'name': '因子分析', 'description': '识别潜在的因子结构'},
'cointegration_test': {'name': '协整检验', 'description': '检验时间序列变量间的长期均衡关系'},
'var_analysis': {'name': '向量自回归', 'description': '多变量时间序列建模和预测'}
}
}
def check_memory():
"""检查内存使用"""
process = psutil.Process(os.getpid())
memory_mb = process.memory_info().rss / 1024 / 1024
logger.info(f"当前内存使用: {memory_mb:.2f} MB")
if memory_mb > settings.MAX_MEMORY_MB:
logger.warning(f"内存使用超过阈值 ({settings.MAX_MEMORY_MB} MB),执行垃圾回收")
gc.collect()
@router.post("/analyze", summary="执行完整分析")
async def analyze_data(request: AnalysisRequest, background_tasks: BackgroundTasks) -> dict:
"""
执行完整的时间序列分析
流程:
1. 加载并预处理数据
2. 执行15种分析方法
3. 调用AI API 进行深度分析
4. 生成PDF/PPT/HTML报告
"""
try:
logger.info("=" * 60)
logger.info(f"开始分析: {request.filename}")
logger.info(f"任务: {request.task_description}")
logger.info(f"语言: {request.language}")
logger.info("=" * 60)
# 检查内存
check_memory()
# 检查文件存在
file_path = settings.get_upload_path(request.filename)
if not file_path.exists():
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"文件未找到: {request.filename}"
)
# 语言处理:支持 zh/en其他值回退为 zh
lang_key = request.language if request.language in {"zh", "en"} else "zh"
# charts 模式下强制不生成图片,即使请求传了 generate_plots=true
generate_plots = False
if request.generate_plots:
logger.info("generate_plots requested true, forcing false to skip image generation")
# 创建分析器实例
logger.info(f"初始化分析器 ({lang_key})...")
analyzer = TimeSeriesAnalysisSystem(
str(file_path),
request.task_description,
data_background=request.data_background,
language=lang_key,
generate_plots=generate_plots
)
# 运行分析
logger.info("执行分析...")
results_zh, log_zh = analyzer.run_analysis()
if results_zh is None:
logger.error("中文分析失败")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="分析失败"
)
logger.info("中文分析完成")
# 准备返回数据
response_data = {
"success": True,
"meta": {
"filename": request.filename,
"task_description": request.task_description,
"language": lang_key,
"generate_plots": generate_plots,
"created_at": datetime.now().isoformat(),
},
"analysis": {
lang_key: {
"pdf_filename": None,
"ppt_filename": None,
"data_description": results_zh.get("data_description"),
"preprocessing_steps": results_zh.get("preprocessing_steps", []),
"api_analysis": results_zh.get("api_analysis", {}),
"steps": results_zh.get("steps", []),
"charts": results_zh.get("charts", {}),
}
},
"images": {},
"log": log_zh[-20:] if log_zh else [],
"original_image": request.original_image if request.file_type == 'image' else None,
}
# 兼容旧前端:始终提供 analysis.zh
if lang_key != "zh":
response_data["analysis"]["zh"] = response_data["analysis"][lang_key]
analysis_bucket = response_data["analysis"][lang_key]
# 去除任何遗留的 image_path兼容旧结构
steps = analysis_bucket.get("steps")
if isinstance(steps, list):
for step in steps:
if isinstance(step, dict) and "image_path" in step:
step.pop("image_path", None)
# images 保持为空兼容旧前端
response_data["images"] = {}
logger.info("分析完成")
return response_data
except HTTPException:
raise
except Exception as e:
logger.error(f"分析异常: {str(e)}", exc_info=True)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=str(e)
)

View File

@ -0,0 +1,191 @@
"""v2 analysis route: analyze CSV from OSS/URL.
Design goals:
- Keep v1 endpoints unchanged
- Provide the same response shape as v1, but with URL as input
- Avoid leaking server local paths
"""
import gc
import logging
import os
import shutil
from datetime import datetime
from typing import Any, Dict, Optional
import psutil
from fastapi import APIRouter, BackgroundTasks, HTTPException, status
from pydantic import BaseModel
from app.core.config import settings
from app.services.analysis import TimeSeriesAnalysisSystem
from app.services.oss_csv_source import UrlValidationError, download_csv_to_tempfile
logger = logging.getLogger(__name__)
router = APIRouter()
class AnalysisV2Request(BaseModel):
"""v2 分析请求模型(输入为 OSS/URL"""
oss_url: str
task_description: str = "时间序列数据分析"
data_background: Dict[str, Any] = {}
language: str = "zh"
generate_plots: bool = False
source_name: Optional[str] = None
@router.get("/available_methods", summary="获取可用的分析方法v2")
async def get_available_methods_v2() -> dict:
"""v2 版本:返回与 v1 相同的可用分析方法列表。"""
return {
"success": True,
"methods": {
"statistical_overview": {"name": "统计概览", "description": "生成数据的基本统计信息和分布图表"},
"time_series_analysis": {"name": "时间序列分析", "description": "分析变量随时间变化的趋势和模式"},
"acf_pacf_analysis": {"name": "自相关分析", "description": "生成自相关和偏自相关函数图"},
"stationarity_tests": {"name": "平稳性检验", "description": "执行ADF、KPSS等平稳性检验"},
"normality_tests": {"name": "正态性检验", "description": "执行Shapiro-Wilk、Jarque-Bera正态性检验"},
"seasonal_decomposition": {"name": "季节性分解", "description": "分解时间序列的趋势、季节和残差成分"},
"spectral_analysis": {"name": "频谱分析", "description": "分析时间序列的频域特征"},
"correlation_analysis": {"name": "相关性分析", "description": "计算变量间的相关性并生成热力图"},
"pca_scree_plot": {"name": "PCA碎石图", "description": "显示主成分分析的解释方差"},
"pca_analysis": {"name": "主成分分析", "description": "降维分析,识别数据的主要变化方向"},
"feature_importance": {"name": "特征重要性", "description": "分析各变量对目标预测的重要性"},
"clustering_analysis": {"name": "聚类分析", "description": "将数据点分组为具有相似特征的簇"},
"factor_analysis": {"name": "因子分析", "description": "识别潜在的因子结构"},
"cointegration_test": {"name": "协整检验", "description": "检验时间序列变量间的长期均衡关系"},
"var_analysis": {"name": "向量自回归", "description": "多变量时间序列建模和预测"},
},
}
def check_memory():
"""检查内存使用"""
process = psutil.Process(os.getpid())
memory_mb = process.memory_info().rss / 1024 / 1024
logger.info(f"当前内存使用: {memory_mb:.2f} MB")
if memory_mb > settings.MAX_MEMORY_MB:
logger.warning(f"内存使用超过阈值 ({settings.MAX_MEMORY_MB} MB),执行垃圾回收")
gc.collect()
@router.post("/analyze", summary="执行完整分析v2从 OSS URL 读取 CSV")
async def analyze_data_v2(request: AnalysisV2Request, background_tasks: BackgroundTasks) -> dict:
"""Analyze CSV from an OSS/URL, returning the same structure as v1."""
downloaded = None
try:
logger.info("=" * 60)
logger.info("开始分析 (v2)")
logger.info(f"URL host: {request.oss_url}")
logger.info(f"任务: {request.task_description}")
logger.info(f"语言: {request.language}")
logger.info("=" * 60)
check_memory()
# 语言处理:支持 zh/en其他值回退为 zh
lang_key = request.language if request.language in {"zh", "en"} else "zh"
# charts 模式下强制不生成图片,即使请求传了 generate_plots=true
generate_plots = False
if request.generate_plots:
logger.info("generate_plots requested true, forcing false to skip image generation")
# 下载到临时文件
try:
downloaded = download_csv_to_tempfile(request.oss_url, suffix=".csv")
except UrlValidationError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
filename_for_meta = request.source_name or downloaded.source_name
# 创建分析器实例(复用原有分析系统)
analyzer = TimeSeriesAnalysisSystem(
downloaded.local_path,
request.task_description,
data_background=request.data_background,
language=lang_key,
generate_plots=generate_plots,
)
# 运行分析
logger.info("执行分析...")
results, log_entries = analyzer.run_analysis()
if results is None:
logger.error("分析失败")
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="分析失败")
# 准备返回数据(尽量与 v1 保持一致)
response_data = {
"success": True,
"meta": {
"filename": filename_for_meta,
"task_description": request.task_description,
"language": lang_key,
"generate_plots": generate_plots,
"created_at": datetime.now().isoformat(),
"version": "v2",
"source": {
"type": "oss_url",
"host": downloaded.source_host,
"name": filename_for_meta,
"etag": downloaded.etag,
"last_modified": downloaded.last_modified,
},
},
"analysis": {
lang_key: {
"pdf_filename": None,
"ppt_filename": None,
"data_description": results.get("data_description"),
"preprocessing_steps": results.get("preprocessing_steps", []),
"api_analysis": results.get("api_analysis", {}),
"steps": results.get("steps", []),
"charts": results.get("charts", {}),
}
},
"images": {},
"log": log_entries[-20:] if log_entries else [],
"original_image": None,
}
# 兼容旧前端:始终提供 analysis.zh
if lang_key != "zh":
response_data["analysis"]["zh"] = response_data["analysis"][lang_key]
analysis_bucket = response_data["analysis"][lang_key]
# 确保不暴露本地路径steps chart 引用即可
steps = analysis_bucket.get("steps")
if isinstance(steps, list):
for step in steps:
if isinstance(step, dict) and "image_path" in step:
step.pop("image_path", None)
# images 保留为空兼容旧前端
response_data["images"] = {}
logger.info("分析完成 (v2)")
return response_data
except HTTPException:
raise
except Exception as e:
logger.error(f"分析异常 (v2): {str(e)}", exc_info=True)
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=str(e))
finally:
# 清理临时文件
if downloaded is not None:
try:
os.unlink(downloaded.local_path)
except Exception:
pass

115
app/api/routes/files.py Normal file
View File

@ -0,0 +1,115 @@
"""
文件服务路由 (图片下载等)
"""
import logging
from pathlib import Path
from fastapi import APIRouter, HTTPException, status
from fastapi.responses import FileResponse
from app.core.config import settings
logger = logging.getLogger(__name__)
router = APIRouter()
@router.get("/image/{filename}", summary="获取图片文件")
async def serve_image(filename: str):
"""
获取可视化图片文件
"""
try:
file_path = settings.get_upload_path(filename)
if not file_path.exists():
logger.error(f"图片未找到: {filename}")
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="图片未找到"
)
logger.info(f"提供图片: {filename}")
return FileResponse(
path=str(file_path),
media_type='image/png',
filename=filename
)
except HTTPException:
raise
except Exception as e:
logger.error(f"获取图片异常: {str(e)}", exc_info=True)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=str(e)
)
@router.get("/download/{filename}", summary="下载文件")
async def download_file(filename: str):
"""
下载报告或其他文件
"""
try:
file_path = settings.get_upload_path(filename)
if not file_path.exists():
logger.error(f"文件未找到: {filename}")
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="文件未找到"
)
logger.info(f"下载文件: {filename}")
return FileResponse(
path=str(file_path),
filename=filename,
media_type='application/octet-stream'
)
except HTTPException:
raise
except Exception as e:
logger.error(f"下载文件异常: {str(e)}", exc_info=True)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=str(e)
)
@router.get("/list_uploads", summary="列出上传的文件")
async def list_uploads():
"""
列出 uploads 目录中的文件
"""
try:
uploads_dir = settings.UPLOAD_DIR
if not uploads_dir.exists():
return {
"success": True,
"files": []
}
files = []
for file_path in uploads_dir.iterdir():
if file_path.is_file():
files.append({
"name": file_path.name,
"size": file_path.stat().st_size,
"modified": file_path.stat().st_mtime
})
logger.info(f"列出 {len(files)} 个文件")
return {
"success": True,
"files": sorted(files, key=lambda x: x['modified'], reverse=True)
}
except Exception as e:
logger.error(f"列出文件异常: {str(e)}", exc_info=True)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=str(e)
)

124
app/api/routes/upload.py Normal file
View File

@ -0,0 +1,124 @@
"""
文件上传路由
"""
import logging
import os
import shutil
from datetime import datetime
from pathlib import Path
from typing import Optional
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, status
from pydantic import BaseModel
from app.core.config import settings
logger = logging.getLogger(__name__)
router = APIRouter()
class UploadResponse(BaseModel):
"""上传响应模型"""
success: bool
filename: str
file_type: str
original_filename: str
task_description: str
message: Optional[str] = None
class UploadImageResponse(BaseModel):
"""上传图片响应模型"""
success: bool
filename: str
file_type: str
original_filename: str
original_image: str
task_description: str
message: str
def allowed_file(filename: str) -> bool:
"""检查文件是否被允许"""
if '.' not in filename:
return False
ext = filename.rsplit('.', 1)[1].lower()
return ext in settings.ALLOWED_EXTENSIONS
@router.post("/upload", response_model=UploadResponse, summary="上传CSV或图片文件")
async def upload_file(
file: UploadFile = File(...),
task_description: str = Form(default="时间序列数据分析")
) -> dict:
"""
上传数据文件CSV 或图片
- **file**: CSV 或图片文件 (PNG, JPG, BMP, TIFF)
- **task_description**: 分析任务描述
"""
try:
logger.info(f"=== 上传请求开始 ===")
logger.info(f"文件名: {file.filename}")
logger.info(f"任务描述: {task_description}")
# 检查文件名
if not file.filename:
logger.error("文件名为空")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="没有选择文件"
)
# 检查文件类型
if not allowed_file(file.filename):
logger.error(f"不支持的文件类型: {file.filename}")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"不支持的文件类型。允许的类型: {', '.join(settings.ALLOWED_EXTENSIONS)}"
)
# 获取文件扩展名
file_ext = file.filename.rsplit('.', 1)[1].lower()
# 生成文件名
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
new_filename = f"upload_{timestamp}_{file.filename}"
# 保存文件
file_path = settings.get_upload_path(new_filename)
logger.info(f"保存文件到: {file_path}")
content = await file.read()
with open(file_path, 'wb') as f:
f.write(content)
logger.info(f"文件保存成功,大小: {len(content)} bytes")
# 处理不同的文件类型
if file_ext == 'csv':
logger.info("处理 CSV 文件")
return {
"success": True,
"filename": new_filename,
"file_type": "csv",
"original_filename": file.filename,
"task_description": task_description
}
else:
logger.warning(f"不支持的文件类型: {file_ext}")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail=f"目前只支持 CSV 文件。您上传的是: {file_ext}"
)
except HTTPException:
raise
except Exception as e:
logger.error(f"上传处理异常: {str(e)}", exc_info=True)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=str(e)
)

0
app/core/__init__.py Normal file
View File

122
app/core/config.py Normal file
View File

@ -0,0 +1,122 @@
"""
FastAPI 应用配置管理
支持环境变量配置生产级配置管理
"""
import os
from pathlib import Path
from typing import Optional
import logging
try:
from dotenv import load_dotenv
except Exception: # pragma: no cover
load_dotenv = None
# 项目根目录
BASE_DIR = Path(__file__).resolve().parent.parent.parent
# 加载 .env不覆盖已存在的系统环境变量
_dotenv_path = BASE_DIR / ".env"
if load_dotenv is not None and _dotenv_path.exists():
load_dotenv(dotenv_path=_dotenv_path, override=False)
# 环境变量
ENVIRONMENT = os.getenv('ENV', 'development')
DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
class Settings:
"""应用配置类"""
# FastAPI 基础配置
APP_TITLE = "时间序列数据分析系统"
APP_DESCRIPTION = "支持多格式数据上传、AI增强分析、多语言报告生成"
APP_VERSION = "2.0.0"
# API 暴露模式
# - full: 暴露 v1 + v2默认
# - v2: 仅暴露 v2 分析接口 + 基础状态接口(禁用 v1 上传/文件/图片接口)
API_MODE = os.getenv('API_MODE', 'full').strip().lower()
# 服务器配置
HOST = os.getenv('HOST', '0.0.0.0')
PORT = int(os.getenv('PORT', 60201))
RELOAD = DEBUG
# CORS 配置
CORS_ORIGINS = os.getenv('CORS_ORIGINS', '*').split(',')
CORS_ALLOW_CREDENTIALS = True
CORS_ALLOW_METHODS = ['*']
CORS_ALLOW_HEADERS = ['*']
# 文件上传配置
UPLOAD_DIR = Path(os.getenv('UPLOAD_DIR', BASE_DIR / 'uploads'))
UPLOAD_DIR.mkdir(exist_ok=True)
MAX_UPLOAD_SIZE = int(os.getenv('MAX_UPLOAD_SIZE', 16 * 1024 * 1024)) # 16MB
ALLOWED_EXTENSIONS = {'csv'}
# 临时文件配置
TEMP_DIR = Path(os.getenv('TEMP_DIR', BASE_DIR / 'temp'))
TEMP_DIR.mkdir(exist_ok=True)
# 字体配置
FONTS_DIR = Path(os.getenv('FONTS_DIR', BASE_DIR / 'resource' / 'fonts'))
FONTS_DIR.mkdir(parents=True, exist_ok=True)
# API 配置 (阿里云千问)
API_KEY = os.getenv('MY_API_KEY', '')
API_BASE = os.getenv('MY_API_BASE', 'https://dashscope.aliyuncs.com/compatible-mode/v1')
API_MODEL = os.getenv('MY_MODEL', 'qwen-turbo')
API_TIMEOUT = int(os.getenv('API_TIMEOUT', 30))
# 分析配置
LANGUAGE_DEFAULT = os.getenv('LANGUAGE_DEFAULT', 'zh')
ANALYSIS_TIMEOUT = int(os.getenv('ANALYSIS_TIMEOUT', 300)) # 5分钟
# 日志配置
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO' if not DEBUG else 'DEBUG')
LOG_DIR = Path(os.getenv('LOG_DIR', BASE_DIR / 'logs'))
LOG_DIR.mkdir(exist_ok=True)
# 内存管理
MAX_MEMORY_MB = int(os.getenv('MAX_MEMORY_MB', 500))
# v2 (OSS URL) 配置
# 允许的域名白名单(逗号分隔)。为空时表示不启用域名白名单(仍会做私网/环回 IP 拦截)。
V2_ALLOWED_HOSTS = [h.strip() for h in os.getenv('V2_ALLOWED_HOSTS', '').split(',') if h.strip()]
# 是否允许 http默认仅 https
V2_ALLOW_HTTP = os.getenv('V2_ALLOW_HTTP', 'False').lower() == 'true'
# 是否允许私网/环回地址(仅用于本地开发/冒烟;生产建议保持 False
V2_ALLOW_PRIVATE_NETWORKS = os.getenv('V2_ALLOW_PRIVATE_NETWORKS', 'False').lower() == 'true'
# 下载超时。requests 支持 (connect, read),这里统一使用 read 超时。
V2_DOWNLOAD_TIMEOUT_SECONDS = float(os.getenv('V2_DOWNLOAD_TIMEOUT_SECONDS', 30))
V2_CONNECT_TIMEOUT_SECONDS = float(os.getenv('V2_CONNECT_TIMEOUT_SECONDS', 5))
@classmethod
def get_upload_path(cls, filename: str) -> Path:
"""获取上传文件的完整路径"""
return cls.UPLOAD_DIR / filename
@classmethod
def get_temp_path(cls, filename: str) -> Path:
"""获取临时文件的完整路径"""
return cls.TEMP_DIR / filename
# 日志配置
def setup_logging():
"""设置日志系统"""
logging.basicConfig(
level=Settings.LOG_LEVEL,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(Settings.LOG_DIR / 'app.log'),
logging.StreamHandler()
]
)
# 创建全局配置实例
settings = Settings()
# 启用日志
setup_logging()

124
app/main.py Normal file
View File

@ -0,0 +1,124 @@
"""
FastAPI 应用主入口
时间序列数据分析系统 FastAPI 版本
"""
import logging
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.gzip import GZipMiddleware
from app.core.config import settings, setup_logging, ENVIRONMENT, DEBUG
from app.services.font_manager import setup_fonts_for_app
from app.services.linux_adapter import init_linux_environment
# 设置日志
logger = logging.getLogger(__name__)
# 应用生命周期
@asynccontextmanager
async def lifespan(app: FastAPI):
"""应用生命周期管理"""
# 启动时
logger.info("=" * 60)
logger.info(f"应用启动: {settings.APP_TITLE}")
logger.info(f"版本: {settings.APP_VERSION}")
logger.info(f"环境: {ENVIRONMENT}")
logger.info(f"调试: {DEBUG}")
logger.info(f"监听: {settings.HOST}:{settings.PORT}")
logger.info("=" * 60)
# 初始化 Linux 环境
try:
init_linux_environment()
except Exception as e:
logger.warning(f"Linux 环境初始化失败: {e}")
# 初始化字体
try:
fonts_config = setup_fonts_for_app(['zh', 'en'])
logger.info(f"字体配置完成: {fonts_config}")
except Exception as e:
logger.error(f"字体配置失败: {e}")
yield
# 关闭时
logger.info("应用关闭")
# 创建 FastAPI 应用
app = FastAPI(
title=settings.APP_TITLE,
description=settings.APP_DESCRIPTION,
version=settings.APP_VERSION,
lifespan=lifespan
)
# 添加中间件
# CORS 中间件
app.add_middleware(
CORSMiddleware,
allow_origins=settings.CORS_ORIGINS,
allow_credentials=settings.CORS_ALLOW_CREDENTIALS,
allow_methods=settings.CORS_ALLOW_METHODS,
allow_headers=settings.CORS_ALLOW_HEADERS,
)
# 压缩中间件
app.add_middleware(GZipMiddleware, minimum_size=1000)
# 导入和包含路由
from app.api.routes import upload, analysis, analysis_v2, files
# v2 模式:仅暴露 v2 分析接口 + 基础状态接口
if settings.API_MODE == "v2":
logger.info("API_MODE=v2: 禁用 v1 上传/文件接口,仅启用 /api/v2")
app.include_router(analysis_v2.router, prefix="/api/v2", tags=["analysis-v2"])
else:
app.include_router(upload.router, prefix="/api", tags=["upload"])
app.include_router(analysis.router, prefix="/api", tags=["analysis"])
app.include_router(analysis_v2.router, prefix="/api/v2", tags=["analysis-v2"])
app.include_router(files.router, prefix="/api", tags=["files"])
# 根路由
@app.get("/")
async def root():
"""根路径"""
return {
"message": "Lazy Stat Backend API",
"version": settings.APP_VERSION,
"docs": "/docs"
}
@app.get("/health")
async def health():
"""健康检查"""
return {
"status": "healthy",
"app": settings.APP_TITLE,
"version": settings.APP_VERSION
}
@app.get("/api/config")
async def get_config():
"""获取应用配置"""
return {
"title": settings.APP_TITLE,
"version": settings.APP_VERSION,
"max_upload_size": settings.MAX_UPLOAD_SIZE,
"allowed_extensions": list(settings.ALLOWED_EXTENSIONS),
"language_default": settings.LANGUAGE_DEFAULT
}
if __name__ == "__main__":
import uvicorn
uvicorn.run(
"app.main:app",
host=settings.HOST,
port=settings.PORT,
reload=settings.RELOAD,
log_level=settings.LOG_LEVEL.lower()
)

0
app/services/__init__.py Normal file
View File

View File

@ -0,0 +1,32 @@
"""Analysis package.
This package contains the refactored analysis modules.
Notes:
- The legacy entrypoint remains `app.services.analysis_system.TimeSeriesAnalysisSystem`.
- Importing `app.services.analysis_system` eagerly here would create a circular import because
`analysis_system` imports `app.services.analysis.modules.*`.
"""
from __future__ import annotations
from typing import Any, TYPE_CHECKING
__all__ = ["TimeSeriesAnalysisSystem"]
if TYPE_CHECKING:
from app.services.analysis_system import TimeSeriesAnalysisSystem as TimeSeriesAnalysisSystem
def __getattr__(name: str) -> Any:
if name == "TimeSeriesAnalysisSystem":
from app.services.analysis_system import TimeSeriesAnalysisSystem
return TimeSeriesAnalysisSystem
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
def __dir__() -> list[str]:
return sorted(list(globals().keys()) + __all__)

View File

@ -0,0 +1,4 @@
"""Implementation modules for analysis methods.
Each file contains one or a small group of closely-related analysis methods.
"""

View File

@ -0,0 +1,180 @@
import gc
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
def generate_statistical_overview(self):
"""生成统计概览 - 优化内存版本"""
fig = None
try:
self._log_step("Generating statistical overview...")
# 检查数据
if not hasattr(self, 'data') or self.data is None or len(self.data) == 0:
self._log_step("No data available for statistical overview", "warning")
return None, "No data available", None
# 计算统计数据
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
stats_df = self.data[numeric_cols].describe().T.reset_index().rename(columns={'index': 'variable'})
summary = f"Generated statistical overview for {len(numeric_cols)} variables"
if not self.generate_plots:
self._log_step("Statistical overview generated (data only)", "success")
return None, summary, stats_df
# 使用更小的图形尺寸和DPI来节省内存
fig, axes = plt.subplots(2, 2, figsize=(10, 8), dpi=100)
fig.suptitle('Statistical Overview', fontsize=14)
# 基本统计信息
# 只处理前4个变量以节省内存
num_vars = min(4, len(self.data.columns))
for i in range(num_vars):
row = i // 2
col = i % 2
col_name = self.data.columns[i]
try:
# 时间序列图
axes[row, col].plot(self.data.index, self.data[col_name], linewidth=1)
axes[row, col].set_title(f'{col_name}')
axes[row, col].tick_params(axis='x', rotation=45)
axes[row, col].grid(True, alpha=0.3)
except Exception as e:
self._log_step(f"Plotting {col_name} failed: {e}", "warning")
axes[row, col].text(
0.5,
0.5,
f'Error: {str(e)[:30]}',
ha='center',
va='center',
transform=axes[row, col].transAxes,
)
plt.tight_layout()
# 保存图片使用更低的DPI
img_path = os.path.join(self.temp_dir.name, 'stats_overview.png')
try:
plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
if not os.path.exists(img_path):
self._log_step("Failed to save statistical overview image", "error")
return None, "Failed to save image", stats_df
except Exception as save_error:
self._log_step(f"Failed to save figure: {save_error}", "error")
return None, f"Save error: {str(save_error)[:100]}", stats_df
finally:
plt.close(fig) # 明确关闭图形释放内存
gc.collect()
self._log_step("Statistical overview generated", "success")
return img_path, summary, stats_df
except Exception as e:
self._log_step(f"Statistical overview failed: {str(e)[:100]}", "error")
if fig is not None:
try:
plt.close(fig)
gc.collect()
except Exception:
pass
return None, f"Statistical overview failed: {str(e)[:100]}", None
def perform_normality_tests(self):
"""执行正态性检验"""
try:
self._log_step("Performing normality tests...")
if hasattr(self, 'data') and self.data is not None:
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
results = {}
for col in numeric_cols[:3]: # 只测试前3个变量
series = self.data[col].dropna()
col_results = {}
# 直方图分箱(后端负责 binning
hist_counts, bin_edges = np.histogram(series, bins=20)
histogram = []
for i in range(len(hist_counts)):
histogram.append({
'range_start': float(bin_edges[i]),
'range_end': float(bin_edges[i + 1]),
'count': int(hist_counts[i])
})
col_results['histogram'] = histogram
# Shapiro-Wilk检验
if len(series) >= 3 and len(series) <= 5000:
shapiro_result = stats.shapiro(series)
col_results['Shapiro-Wilk'] = {
'statistic': float(shapiro_result[0]),
'p_value': float(shapiro_result[1]),
'normal': bool(shapiro_result[1] > 0.05),
}
# Jarque-Bera检验
jb_result = stats.jarque_bera(series)
# SciPy result typing varies by version; keep runtime behavior and silence stub mismatch.
jb_stat = float(jb_result[0]) # type: ignore[index,arg-type]
jb_p = float(jb_result[1]) # type: ignore[index,arg-type]
col_results['Jarque-Bera'] = {
'statistic': jb_stat,
'p_value': jb_p,
'normal': bool(jb_p > 0.05),
}
results[col] = col_results
summary = f"正态性检验完成,测试了 {len(results)} 个变量"
if not self.generate_plots:
self._log_step("Normality tests completed (data only)", "success")
return None, summary, results
# 创建正态性检验可视化
n_cols = min(3, len(numeric_cols))
fig, axes = plt.subplots(n_cols, 2, figsize=(12, 4 * n_cols))
fig.suptitle('正态性检验结果', fontsize=16)
if n_cols == 1:
axes = axes.reshape(1, -1)
for i, col in enumerate(numeric_cols[:n_cols]):
series = self.data[col].dropna()
# 直方图与正态曲线
axes[i, 0].hist(series, bins=20, density=True, alpha=0.7, color='skyblue')
xmin, xmax = axes[i, 0].get_xlim()
x = np.linspace(xmin, xmax, 100)
p = stats.norm.pdf(x, series.mean(), series.std())
axes[i, 0].plot(x, p, 'k', linewidth=2)
axes[i, 0].set_title(f'{col} - 分布直方图')
# Q-Q图
stats.probplot(series, dist="norm", plot=axes[i, 1])
axes[i, 1].set_title(f'{col} - Q-Q图')
plt.tight_layout()
img_path = os.path.join(self.temp_dir.name, 'normality_tests.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("Normality tests completed", "success")
return img_path, summary, results
self._log_step("No data available for normality tests", "warning")
return None, "数据不足,无法进行正态性检验", None
except Exception as e:
self._log_step(f"Normality tests failed: {e}", "error")
return None, f"正态性检验失败: {e}", None

View File

@ -0,0 +1,112 @@
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
def analyze_feature_importance(self):
"""分析特征重要性"""
try:
self._log_step("Analyzing feature importance...")
if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
self._log_step("Not enough data for feature importance analysis", "warning")
return None, "Not enough data for feature importance analysis", None
X = self.data
y = self.data.iloc[:, 0] # 使用第一列作为目标变量
model = RandomForestRegressor(n_estimators=50, random_state=42) # 减少树的数量
model.fit(X, y)
feature_importance = pd.Series(model.feature_importances_, index=X.columns)
feature_importance = feature_importance.sort_values(ascending=False)
fi_df = feature_importance.reset_index()
fi_df.columns = ['feature', 'importance']
summary = f"Feature importance analysis completed, top feature: {fi_df.iloc[0]['feature']}"
if not self.generate_plots:
self._log_step("Feature importance analysis completed (data only)", "success")
return None, summary, fi_df
plt.figure(figsize=(8, 6))
feature_importance.head(10).plot(kind='bar')
plt.title('Feature Importance Analysis')
plt.ylabel('Importance Score')
plt.tight_layout()
img_path = os.path.join(self.temp_dir.name, 'feature_importance.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("Feature importance analysis completed", "success")
return img_path, summary, fi_df
except Exception as e:
self._log_step(f"Feature importance analysis failed: {e}", "error")
return None, f"Feature importance analysis failed: {e}", None
def perform_var_analysis(self):
"""执行向量自回归分析"""
try:
self._log_step("Performing VAR analysis...")
if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
self._log_step("Not enough data for VAR analysis", "warning")
return None, "数据不足无法进行VAR分析", None
from statsmodels.tsa.api import VAR
numeric_data = self.data.select_dtypes(include=[np.number])
if len(numeric_data.columns) < 2:
self._log_step("Not enough numeric columns for VAR analysis", "warning")
return None, "数值变量不足无法进行VAR分析", None
var_data = numeric_data.iloc[:, : min(3, len(numeric_data.columns))]
model = VAR(var_data)
results = model.fit(maxlags=2, ic='aic')
lag_order = results.k_ar
forecast = results.forecast(var_data.values[-lag_order:], steps=10)
forecast_df = pd.DataFrame(data=forecast, columns=[f"{col}_forecast" for col in var_data.columns])
summary = f"VAR分析完成使用滞后阶数: {results.k_ar}生成了10期预测"
if not self.generate_plots:
self._log_step("VAR analysis completed (data only)", "success")
return None, summary, forecast_df
plt.figure(figsize=(12, 8))
for i, col in enumerate(var_data.columns):
plt.plot(range(len(var_data)), var_data[col].values, label=f'{col} (actual)', alpha=0.7)
plt.plot(
range(len(var_data), len(var_data) + 10),
forecast[:, i],
label=f'{col} (forecast)',
linestyle='--',
)
plt.axvline(x=len(var_data), color='red', linestyle=':', alpha=0.7, label='Forecast Start')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Vector Autoregression (VAR) Forecast')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
img_path = os.path.join(self.temp_dir.name, 'var_analysis.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("VAR analysis completed", "success")
return img_path, summary, forecast_df
except Exception as e:
self._log_step(f"VAR analysis failed: {e}", "error")
return None, f"VAR分析失败: {e}", None

View File

@ -0,0 +1,301 @@
import gc
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
def generate_correlation_heatmap(self):
"""生成相关性热力图"""
fig = None
try:
self._log_step("Generating correlation heatmap...")
if not hasattr(self, 'data') or self.data is None or len(self.data.columns) <= 1:
self._log_step("Not enough data for correlation analysis", "warning")
return None, "Not enough data", None
# 计算相关性矩阵
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
corr_matrix = self.data[numeric_cols].corr()
summary = "Correlation matrix calculated"
if not self.generate_plots:
self._log_step("Correlation analysis completed (data only)", "success")
# 替换NaN为None以兼容JSON
return None, summary, corr_matrix.where(pd.notnull(corr_matrix), None)
# 创建热力图
fig = plt.figure(figsize=(8, 6), dpi=100)
sns.heatmap(
corr_matrix,
annot=True,
fmt=".2f",
cmap='coolwarm',
center=0,
square=True,
cbar_kws={"shrink": 0.8},
)
plt.title('Correlation Heatmap')
plt.tight_layout()
# 保存图片
img_path = os.path.join(self.temp_dir.name, 'correlation_heatmap.png')
try:
plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
except Exception as save_err:
self._log_step(f"Save error: {save_err}", "error")
return None, f"Save error: {str(save_err)[:100]}", corr_matrix.where(pd.notnull(corr_matrix), None)
finally:
plt.close(fig)
gc.collect()
self._log_step("Correlation heatmap generated", "success")
return img_path, summary, corr_matrix.where(pd.notnull(corr_matrix), None)
except Exception as e:
self._log_step(f"Correlation heatmap failed: {str(e)[:100]}", "error")
if fig is not None:
try:
plt.close(fig)
except Exception:
pass
return None, f"Correlation heatmap failed: {str(e)[:100]}", None
def generate_pca_scree_plot(self):
"""生成PCA碎石图"""
try:
self._log_step("Generating PCA scree plot...")
if hasattr(self, 'scaled_data') and self.scaled_data is not None:
pca = PCA()
pca.fit(self.scaled_data)
explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)
# 准备数据
scree_data = pd.DataFrame({
'component': range(1, len(explained_variance) + 1),
'explained_variance': explained_variance,
'cumulative_variance': cumulative_variance,
})
summary = (
"PCA碎石图生成完成前2个主成分解释 "
f"{cumulative_variance[min(1, len(cumulative_variance) - 1)]:.2%} 方差"
)
if not self.generate_plots:
self._log_step("PCA scree data generated", "success")
return None, summary, scree_data
# 创建碎石图
plt.figure(figsize=(10, 6))
# 绘制碎石图
plt.subplot(1, 2, 1)
plt.plot(range(1, len(explained_variance) + 1), explained_variance, 'bo-')
plt.title('PCA碎石图')
plt.xlabel('主成分')
plt.ylabel('解释方差比例')
plt.grid(True, alpha=0.3)
# 绘制累积方差图
plt.subplot(1, 2, 2)
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, 'ro-')
plt.title('累积解释方差')
plt.xlabel('主成分数量')
plt.ylabel('累积方差比例')
plt.axhline(y=0.85, color='g', linestyle='--', label='85% 方差')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
img_path = os.path.join(self.temp_dir.name, 'pca_scree_plot.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("PCA scree plot generated", "success")
return img_path, summary, scree_data
self._log_step("No scaled data available for PCA scree plot", "warning")
return None, "没有标准化数据可用于PCA碎石图", None
except Exception as e:
self._log_step(f"PCA scree plot failed: {e}", "error")
return None, f"PCA碎石图生成失败: {e}", None
def perform_pca_analysis(self):
"""执行主成分分析"""
try:
self._log_step("Performing PCA analysis...")
if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
pca = PCA(n_components=2)
principal_components = pca.fit_transform(self.scaled_data)
summary = (
"PCA analysis completed, explained variance: "
f"{pca.explained_variance_ratio_[0]:.2%} + {pca.explained_variance_ratio_[1]:.2%}"
)
if not self.generate_plots:
self._log_step("PCA analysis completed (data only)", "success")
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
pca_df['timestamp'] = self.data.index.astype(str)
return None, summary, pca_df
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
pca_df['timestamp'] = self.data.index.astype(str)
# 创建PCA散点图
plt.figure(figsize=(8, 6))
plt.scatter(principal_components[:, 0], principal_components[:, 1], alpha=0.7)
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%})')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%})')
plt.title('Principal Component Analysis (PCA)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
# 保存图片
img_path = os.path.join(self.temp_dir.name, 'pca_analysis.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("PCA analysis completed", "success")
return img_path, summary, pca_df
self._log_step("Not enough data for PCA analysis", "warning")
return None, "Not enough data for PCA analysis", None
except Exception as e:
self._log_step(f"PCA analysis failed: {e}", "error")
return None, f"PCA analysis failed: {e}", None
def perform_clustering_analysis(self):
"""执行聚类分析"""
try:
self._log_step("Performing clustering analysis...")
if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(self.scaled_data)
summary = f"Clustering analysis completed, found {len(np.unique(clusters))} clusters"
if not self.generate_plots:
self._log_step("Clustering analysis completed (data only)", "success")
cluster_df = pd.DataFrame({'cluster': clusters})
cluster_df['timestamp'] = self.data.index.astype(str)
return None, summary, cluster_df
cluster_df = pd.DataFrame({'cluster': clusters})
cluster_df['timestamp'] = self.data.index.astype(str)
# 如果数据是2D的直接绘制聚类结果
if len(self.scaled_data.columns) >= 2:
plt.figure(figsize=(8, 6))
plt.scatter(
self.scaled_data.iloc[:, 0],
self.scaled_data.iloc[:, 1],
c=clusters,
cmap='viridis',
alpha=0.7,
)
plt.xlabel(self.scaled_data.columns[0])
plt.ylabel(self.scaled_data.columns[1])
plt.title('Clustering Analysis')
plt.colorbar(label='Cluster')
plt.tight_layout()
else:
# 对于高维数据使用PCA降维后可视化
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(self.scaled_data)
plt.figure(figsize=(8, 6))
plt.scatter(reduced_data[:, 0], reduced_data[:, 1], c=clusters, cmap='viridis', alpha=0.7)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('Clustering Analysis (PCA Reduced)')
plt.colorbar(label='Cluster')
plt.tight_layout()
# 保存图片
img_path = os.path.join(self.temp_dir.name, 'clustering_analysis.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("Clustering analysis completed", "success")
return img_path, summary, cluster_df
self._log_step("Not enough data for clustering analysis", "warning")
return None, "Not enough data for clustering analysis", None
except Exception as e:
self._log_step(f"Clustering analysis failed: {e}", "error")
return None, f"Clustering analysis failed: {e}", None
def perform_factor_analysis(self):
"""执行因子分析"""
try:
self._log_step("Performing factor analysis...")
if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
from sklearn.decomposition import FactorAnalysis
fa = FactorAnalysis(n_components=2, random_state=42)
factors = fa.fit_transform(self.scaled_data)
summary = "因子分析完成提取了2个主要因子"
if not self.generate_plots:
self._log_step("Factor analysis completed (data only)", "success")
factor_df = pd.DataFrame(data=factors, columns=['Factor1', 'Factor2'])
factor_df['timestamp'] = self.data.index.astype(str)
return None, summary, factor_df
factor_df = pd.DataFrame(data=factors, columns=['Factor1', 'Factor2'])
factor_df['timestamp'] = self.data.index.astype(str)
# 创建因子分析图
plt.figure(figsize=(10, 8))
plt.scatter(factors[:, 0], factors[:, 1], alpha=0.7)
plt.xlabel('Factor 1')
plt.ylabel('Factor 2')
plt.title('Factor Analysis')
plt.grid(True, alpha=0.3)
# 添加因子载荷
for i, (x, y) in enumerate(factors[:10]): # 只显示前10个点
plt.annotate(str(i), (x, y), xytext=(5, 5), textcoords='offset points', fontsize=8)
plt.tight_layout()
# 保存图片
img_path = os.path.join(self.temp_dir.name, 'factor_analysis.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("Factor analysis completed", "success")
return img_path, summary, factor_df
self._log_step("Not enough data for factor analysis", "warning")
return None, "数据不足,无法进行因子分析", None
except Exception as e:
self._log_step(f"Factor analysis failed: {e}", "error")
return None, f"因子分析失败: {e}", None

View File

@ -0,0 +1,169 @@
import os
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller, kpss
def perform_stationarity_tests(self):
"""执行平稳性检验 - ADF, KPSS, PP检验"""
try:
self._log_step("Performing stationarity tests...")
if hasattr(self, 'data') and self.data is not None:
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
results = {}
for col in numeric_cols[:3]: # 只测试前3个变量
series = self.data[col].dropna()
col_results = {}
# ADF检验
adf_result = adfuller(series)
adf_crit = adf_result[4] # type: ignore[index]
if isinstance(adf_crit, dict):
adf_crit = {str(k): float(v) for k, v in adf_crit.items()}
col_results['ADF'] = {
'statistic': float(adf_result[0]),
'p_value': float(adf_result[1]),
'critical_values': adf_crit,
'stationary': bool(adf_result[1] < 0.05),
}
# KPSS检验
try:
kpss_result = kpss(series, regression='c')
kpss_crit = kpss_result[3]
if isinstance(kpss_crit, dict):
kpss_crit = {str(k): float(v) for k, v in kpss_crit.items()}
col_results['KPSS'] = {
'statistic': float(kpss_result[0]),
'p_value': float(kpss_result[1]),
'critical_values': kpss_crit,
'stationary': bool(kpss_result[1] > 0.05),
}
except Exception:
col_results['KPSS'] = '检验失败'
results[col] = col_results
summary = f"平稳性检验完成,测试了 {len(results)} 个变量"
if not self.generate_plots:
self._log_step("Stationarity tests completed (data only)", "success")
return None, summary, results
# 创建平稳性检验可视化
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('平稳性检验结果', fontsize=16)
# 绘制时间序列
for i, col in enumerate(numeric_cols[:2]):
axes[0, i].plot(self.data.index, self.data[col])
axes[0, i].set_title(f'{col} - 时间序列')
axes[0, i].tick_params(axis='x', rotation=45)
axes[0, i].grid(True, alpha=0.3)
# 绘制ADF检验结果
test_stats = [results[col]['ADF']['statistic'] for col in list(results.keys())[:2]]
p_values = [results[col]['ADF']['p_value'] for col in list(results.keys())[:2]]
x_pos = np.arange(len(test_stats))
axes[1, 0].bar(x_pos - 0.2, test_stats, 0.4, label='检验统计量', alpha=0.7)
axes[1, 0].bar(x_pos + 0.2, p_values, 0.4, label='p值', alpha=0.7)
axes[1, 0].set_title('ADF检验结果')
axes[1, 0].set_xticks(x_pos)
axes[1, 0].set_xticklabels(list(results.keys())[:2])
axes[1, 0].legend()
axes[1, 0].axhline(y=0.05, color='r', linestyle='--', label='显著性水平 (0.05)')
# 绘制结论
stationary_status = [
'平稳' if results[col]['ADF']['stationary'] else '非平稳' for col in list(results.keys())[:2]
]
colors = ['green' if status == '平稳' else 'red' for status in stationary_status]
axes[1, 1].bar(x_pos, [1] * len(stationary_status), color=colors, alpha=0.7)
axes[1, 1].set_title('平稳性结论')
axes[1, 1].set_xticks(x_pos)
axes[1, 1].set_xticklabels(list(results.keys())[:2])
for i, status in enumerate(stationary_status):
axes[1, 1].text(i, 0.5, status, ha='center', va='center', fontweight='bold')
plt.tight_layout()
img_path = os.path.join(self.temp_dir.name, 'stationarity_tests.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("Stationarity tests completed", "success")
return img_path, summary, results
self._log_step("No data available for stationarity tests", "warning")
return None, "数据不足,无法进行平稳性检验", None
except Exception as e:
self._log_step(f"Stationarity tests failed: {e}", "error")
return None, f"平稳性检验失败: {e}", None
def perform_cointegration_test(self):
"""执行协整检验"""
try:
self._log_step("Performing cointegration test...")
if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
self._log_step("Not enough data for cointegration test", "warning")
return None, "数据不足,无法进行协整检验", None
from statsmodels.tsa.vector_ar.vecm import coint_johansen
numeric_data = self.data.select_dtypes(include=[np.number])
if len(numeric_data.columns) < 2:
self._log_step("Not enough numeric columns for cointegration test", "warning")
return None, "数值变量不足,无法进行协整检验", None
result = coint_johansen(numeric_data, det_order=0, k_ar_diff=1)
summary = (
f"协整检验完成,轨迹统计量: {result.trace_stat[0]:.3f}, "
f"临界值(95%): {result.trace_stat_crit_vals[0, 1]:.3f}"
)
coint_data = {
'trace_stat': result.trace_stat.tolist(),
'trace_stat_crit_vals': result.trace_stat_crit_vals.tolist(),
'eigen_vals': result.eig.tolist(),
}
if not self.generate_plots:
self._log_step("Cointegration test completed (data only)", "success")
return None, summary, coint_data
plt.figure(figsize=(10, 6))
positions = np.arange(len(result.trace_stat))
plt.bar(positions - 0.2, result.trace_stat, width=0.4, label='Trace Statistic', alpha=0.7)
plt.bar(
positions + 0.2,
result.trace_stat_crit_vals[:, 1],
width=0.4,
label='Critical Value (95%)',
alpha=0.7,
)
plt.xlabel('Number of Cointegrating Relations')
plt.ylabel('Test Statistic')
plt.title('Johansen Cointegration Test Results')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
img_path = os.path.join(self.temp_dir.name, 'cointegration_test.png')
plt.savefig(img_path, dpi=150, bbox_inches='tight')
plt.close()
self._log_step("Cointegration test completed", "success")
return img_path, summary, coint_data
except Exception as e:
self._log_step(f"Cointegration test failed: {e}", "error")
return None, f"协整检验失败: {e}", None

View File

@ -0,0 +1,242 @@
import gc
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from scipy.signal import spectrogram, periodogram
def generate_time_series_plots(self):
"""生成时间序列图"""
try:
self._log_step("Generating time series plots...")
if not hasattr(self, 'data') or self.data is None or len(self.data.columns) == 0:
self._log_step("No data available for time series plots", "warning")
return None, "No data available", None
# 准备数据
n_plots = min(4, len(self.data.columns))
plot_data = self.data.iloc[:, :n_plots].reset_index()
# 将 timestamp 转为字符串确保JSON可序列化
if 'timestamp' in plot_data.columns:
plot_data['timestamp'] = plot_data['timestamp'].astype(str)
summary = f"Generated {n_plots} time series charts"
# charts 模式:仅返回数据,不生成图片;保留绘图版在下方注释
self._log_step("Time series data prepared", "success")
return None, summary, plot_data
# --- 绘图版保留参考 ---
# fig, axes = plt.subplots(2, 2, figsize=(10, 8), dpi=100)
# fig.suptitle('Time Series Analysis', fontsize=14)
# axes = axes.flatten()
# for i in range(n_plots):
# try:
# col = self.data.columns[i]
# axes[i].plot(self.data.index, self.data[col], linewidth=1)
# axes[i].set_title(f'{col}')
# axes[i].tick_params(axis='x', rotation=45)
# axes[i].grid(True, alpha=0.3)
# except Exception as plot_err:
# self._log_step(f"Plot {col} error: {plot_err}", "warning")
# for i in range(n_plots, len(axes)):
# fig.delaxes(axes[i])
# plt.tight_layout()
# img_path = os.path.join(self.temp_dir.name, 'time_series.png')
# plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
# plt.close(fig)
# self._log_step("Time series plots generated", "success")
# return img_path, summary, plot_data
except Exception as e:
self._log_step(f"Time series plots failed: {str(e)[:100]}", "error")
return None, f"Error: {e}", None
def generate_acf_pacf_plots(self):
"""生成自相关和偏自相关图"""
try:
self._log_step("Generating ACF and PACF plots...")
if hasattr(self, 'data') and self.data is not None:
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
n_cols = min(3, len(numeric_cols))
# 计算ACF和PACF数据
acf_pacf_results = {}
for col in numeric_cols[:n_cols]:
series = self.data[col].dropna()
try:
acf_vals = np.asarray(acf(series, nlags=min(40, len(series) // 4)))
pacf_vals = np.asarray(pacf(series, nlags=min(20, len(series) // 5)))
acf_pacf_results[col] = {
'acf': acf_vals.tolist(),
'pacf': pacf_vals.tolist(),
}
except Exception as e:
self._log_step(f"Error calculating ACF/PACF for {col}: {e}", "warning")
summary = f"生成 {n_cols} 个变量的ACF和PACF数据"
self._log_step("ACF and PACF data generated", "success")
return None, summary, acf_pacf_results
# --- 绘图版保留参考 ---
# fig, axes = plt.subplots(n_cols, 2, figsize=(12, 4 * n_cols))
# fig.suptitle('自相关和偏自相关分析', fontsize=16)
# if n_cols == 1:
# axes = axes.reshape(1, -1)
# for i, col in enumerate(numeric_cols[:n_cols]):
# series = self.data[col].dropna()
# plot_acf(series, ax=axes[i, 0], lags=min(40, len(series) // 4))
# axes[i, 0].set_title(f'{col} - 自相关函数 (ACF)')
# plot_pacf(series, ax=axes[i, 1], lags=min(20, len(series) // 5))
# axes[i, 1].set_title(f'{col} - 偏自相关函数 (PACF)')
# plt.tight_layout()
# img_path = os.path.join(self.temp_dir.name, 'acf_pacf_plots.png')
# plt.savefig(img_path, dpi=150, bbox_inches='tight')
# plt.close()
# self._log_step("ACF and PACF plots generated", "success")
# return img_path, f"生成 {n_cols} 个变量的ACF和PACF图", acf_pacf_results
self._log_step("No data available for ACF/PACF plots", "warning")
return None, "数据不足无法生成ACF/PACF图", None
except Exception as e:
self._log_step(f"ACF/PACF plots failed: {e}", "error")
return None, f"ACF/PACF图生成失败: {e}", None
def perform_seasonal_decomposition(self):
"""执行季节性分解"""
try:
self._log_step("Performing seasonal decomposition...")
if hasattr(self, 'data') and self.data is not None:
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
# 选择第一个数值列进行分解
if len(numeric_cols) > 0:
col = numeric_cols[0]
series = self.data[col].dropna()
# 季节性分解
result = seasonal_decompose(series, model='additive', period=min(24, len(series) // 2))
decomposition_data = pd.DataFrame({
'observed': result.observed,
'trend': result.trend,
'seasonal': result.seasonal,
'resid': result.resid,
})
# 填充NaN以确保JSON序列化
decomposition_data = decomposition_data.astype(object).where(
pd.notnull(decomposition_data),
None, # type: ignore[arg-type]
)
summary = f"季节性分解完成,变量: {col}"
self._log_step("Seasonal decomposition completed (data only)", "success")
return None, summary, decomposition_data
# --- 绘图版保留参考 ---
# fig, axes = plt.subplots(4, 1, figsize=(12, 10))
# fig.suptitle(f'{col} - 季节性分解', fontsize=16)
# result.observed.plot(ax=axes[0], title='原始序列')
# result.trend.plot(ax=axes[1], title='趋势成分')
# result.seasonal.plot(ax=axes[2], title='季节成分')
# result.resid.plot(ax=axes[3], title='残差成分')
# for ax in axes:
# ax.tick_params(axis='x', rotation=45)
# ax.grid(True, alpha=0.3)
# plt.tight_layout()
# img_path = os.path.join(self.temp_dir.name, 'seasonal_decomposition.png')
# plt.savefig(img_path, dpi=150, bbox_inches='tight')
# plt.close()
# self._log_step("Seasonal decomposition completed", "success")
# return img_path, summary, decomposition_data
self._log_step("No numeric columns for decomposition", "warning")
return None, "没有数值列可用于季节性分解", None
self._log_step("No data available for seasonal decomposition", "warning")
return None, "数据不足,无法进行季节性分解", None
except Exception as e:
self._log_step(f"Seasonal decomposition failed: {e}", "error")
return None, f"季节性分解失败: {e}", None
def perform_spectral_analysis(self):
"""执行频谱分析"""
try:
self._log_step("Performing spectral analysis...")
if hasattr(self, 'data') and self.data is not None:
numeric_cols = self.data.select_dtypes(include=[np.number]).columns
# 计算频谱数据(简化输出,避免数据量过大)
spectral_results = {}
for col in numeric_cols[:2]:
try:
series = self.data[col].dropna().values
f, t, Sxx = spectrogram(series, fs=1.0, nperseg=min(256, len(series) // 4))
f_p, Pxx_den = periodogram(series, fs=1.0)
# 仅保留频谱的均值和形状,避免返回完整矩阵
Sxx_log = 10 * np.log10(Sxx + 1e-12)
spectral_results[col] = {
'spectrogram': {
'f': f.tolist(),
't': t.tolist(),
'Sxx_log10_mean': float(np.mean(Sxx_log)),
'Sxx_shape': Sxx.shape,
},
'periodogram': {
'f': f_p.tolist()[:20],
'Pxx_den': Pxx_den.tolist()[:20],
},
}
except Exception as e:
self._log_step(f"Spectral calc failed for {col}: {e}", "warning")
summary = "Spectral analysis completed"
self._log_step("Spectral analysis completed (data only)", "success")
return None, summary, spectral_results
# --- 绘图版保留参考 ---
# n_cols = min(2, len(numeric_cols))
# fig, axes = plt.subplots(n_cols, 2, figsize=(15, 5 * n_cols))
# fig.suptitle('频谱分析', fontsize=16)
# if n_cols == 1:
# axes = axes.reshape(1, -1)
# for i, col in enumerate(numeric_cols[:n_cols]):
# series = self.data[col].dropna().values
# f, t, Sxx = spectrogram(series, fs=1.0, nperseg=min(256, len(series) // 4))
# axes[i, 0].pcolormesh(t, f, 10 * np.log10(Sxx), shading='gouraud')
# axes[i, 0].set_title(f'{col} - 频谱图')
# axes[i, 0].set_ylabel('频率 [Hz]')
# axes[i, 0].set_xlabel('时间')
# f, Pxx_den = periodogram(series, fs=1.0)
# axes[i, 1].semilogy(f, Pxx_den)
# axes[i, 1].set_title(f'{col} - 周期图')
# axes[i, 1].set_xlabel('频率 [Hz]')
# axes[i, 1].set_ylabel('PSD [V**2/Hz]')
# axes[i, 1].grid(True, alpha=0.3)
# plt.tight_layout()
# img_path = os.path.join(self.temp_dir.name, 'spectral_analysis.png')
# plt.savefig(img_path, dpi=150, bbox_inches='tight')
# plt.close()
# self._log_step("Spectral analysis completed", "success")
# return img_path, f"频谱分析完成,分析了 {n_cols} 个变量", spectral_results
self._log_step("No data available for spectral analysis", "warning")
return None, "数据不足,无法进行频谱分析", None
except Exception as e:
self._log_step(f"Spectral analysis failed: {e}", "error")
return None, f"频谱分析失败: {e}", None

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,249 @@
"""
字体管理模块 - 支持跨平台字体检测和配置
支持 LinuxmacOSWindows 三个平台
"""
import os
import sys
import logging
from pathlib import Path
from typing import Optional, List, Dict
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
from app.core.config import settings
logger = logging.getLogger(__name__)
class FontManager:
"""字体管理器 - 处理跨平台字体检测和配置"""
# 支持的字体路径映射(按优先级排序)
FONT_PATHS = {
'zh': { # 中文字体
'linux': [
'/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc',
'/usr/share/fonts/truetype/wqy/wqy-microhei.ttc',
'/usr/share/fonts/truetype/liberation/LiberationSerif-Regular.ttf',
'/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf',
],
'darwin': [ # macOS
'/Library/Fonts/SimHei.ttf',
'/System/Library/Fonts/STHeiti Light.ttc',
'/Applications/Microsoft Office/Library/Fonts/SimSun.ttf',
'/Library/Fonts/Arial.ttf',
],
'win32': [
'C:\\Windows\\Fonts\\simhei.ttf',
'C:\\Windows\\Fonts\\simsun.ttc',
'C:\\Windows\\Fonts\\msyh.ttc',
'C:\\Windows\\Fonts\\arial.ttf',
]
},
'en': { # 英文字体
'linux': [
'/usr/share/fonts/truetype/liberation/LiberationSerif-Regular.ttf',
'/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf',
'/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf',
],
'darwin': [
'/Library/Fonts/Times New Roman.ttf',
'/Library/Fonts/Arial.ttf',
'/System/Library/Fonts/Helvetica.ttc',
],
'win32': [
'C:\\Windows\\Fonts\\times.ttf',
'C:\\Windows\\Fonts\\arial.ttf',
'C:\\Windows\\Fonts\\georgia.ttf',
]
}
}
# 项目内置字体
PROJECT_FONTS = {
'zh_regular': 'SubsetOTF/CN/SourceHanSansCN-Regular.otf',
'zh_bold': 'SubsetOTF/CN/SourceHanSansCN-Bold.otf',
'en_regular': None, # 英文使用系统字体
}
def __init__(self, fonts_dir: Optional[Path] = None):
"""
初始化字体管理器
Args:
fonts_dir: 项目字体目录路径
"""
self.fonts_dir = fonts_dir or settings.FONTS_DIR
self.platform = sys.platform
self.available_fonts = {}
self._init_fonts()
def _init_fonts(self):
"""初始化字体系统"""
logger.info(f"初始化字体系统 (平台: {self.platform})")
# 扫描系统和项目字体
self._scan_system_fonts()
self._register_project_fonts()
def _scan_system_fonts(self):
"""扫描系统可用字体"""
logger.info("扫描系统字体...")
for lang, fonts in self.FONT_PATHS.items():
paths = fonts.get(self.platform, [])
for font_path in paths:
if os.path.exists(font_path):
self.available_fonts[lang] = font_path
logger.info(f"找到{lang}字体: {font_path}")
break
if lang not in self.available_fonts:
logger.warning(f"未找到系统{lang}字体")
def _register_project_fonts(self):
"""注册项目内置字体"""
logger.info(f"扫描项目字体目录: {self.fonts_dir}")
# 注册中文字体
zh_font_path = self.fonts_dir / self.PROJECT_FONTS['zh_regular']
if zh_font_path.exists():
try:
self.available_fonts['zh'] = str(zh_font_path)
logger.info(f"注册项目中文字体: {zh_font_path}")
except Exception as e:
logger.warning(f"注册项目中文字体失败: {e}")
def get_font(self, language: str = 'zh') -> str:
"""
获取可用的字体路径
Args:
language: 语言类型 ('zh' 'en')
Returns:
字体文件路径
"""
if language in self.available_fonts:
return self.available_fonts[language]
logger.warning(f"未找到{language}字体,使用默认字体")
return 'DejaVuSans' if language == 'en' else 'Arial'
def setup_matplotlib_font(self, language: str = 'zh'):
"""
配置 Matplotlib 使用的字体
Args:
language: 语言类型 ('zh' 'en')
"""
try:
font_path = self.get_font(language)
if os.path.isfile(font_path):
# 注册字体文件到 Matplotlib
fm.fontManager.addfont(font_path)
# 从文件路径加载字体
prop = fm.FontProperties(fname=font_path)
plt.rcParams['font.sans-serif'] = [prop.get_name()]
# 解决负号显示问题
plt.rcParams['axes.unicode_minus'] = False
logger.info(f"Matplotlib 字体配置为: {font_path}")
else:
# 使用字体名称
plt.rcParams['font.sans-serif'] = [font_path]
plt.rcParams['axes.unicode_minus'] = False
logger.info(f"Matplotlib 字体配置为: {font_path}")
plt.rcParams['axes.unicode_minus'] = False # 解决负号显示问题
except Exception as e:
logger.error(f"配置 Matplotlib 字体失败: {e}")
def get_font_installation_command(self) -> str:
"""
获取当前系统推荐的字体安装命令
Returns:
安装命令字符串
"""
if self.platform == 'linux':
return "apt-get install fonts-wqy-microhei fonts-noto-cjk-extra -y"
elif self.platform == 'darwin':
return "brew install --cask font-noto-sans-cjk"
else:
return "请从 https://www.noto-fonts.cn 下载并安装 Noto Sans CJK 字体"
def suggest_font_installation(self) -> bool:
"""
检查并建议安装字体
Returns:
是否建议安装字体
"""
if 'zh' not in self.available_fonts:
logger.warning("=" * 60)
logger.warning("⚠️ 警告: 未找到中文字体!")
logger.warning("推荐的安装命令:")
logger.warning(self.get_font_installation_command())
logger.warning("=" * 60)
return True
return False
@staticmethod
def check_font_available(font_name: str) -> bool:
"""
检查指定字体是否可用
Args:
font_name: 字体名称
Returns:
字体是否可用
"""
try:
fm.findfont(fm.FontProperties(family=font_name))
return True
except:
return False
# 全局字体管理器实例
_font_manager: Optional[FontManager] = None
def get_font_manager(fonts_dir: Optional[Path] = None) -> FontManager:
"""获取全局字体管理器实例"""
global _font_manager
if _font_manager is None:
_font_manager = FontManager(fonts_dir)
return _font_manager
def setup_fonts_for_app(languages: List[str] = ['zh', 'en']) -> Dict[str, str]:
"""
为应用设置字体 (一次性初始化)
Args:
languages: 需要支持的语言列表
Returns:
字体配置字典
"""
font_manager = get_font_manager()
# 提示用户安装字体(如需要)
font_manager.suggest_font_installation()
# 为每个语言配置 Matplotlib
fonts_config = {}
for lang in languages:
try:
# 配置 Matplotlib
font_manager.setup_matplotlib_font(lang)
logger.info(f"{lang} 语言字体配置完成")
except Exception as e:
logger.error(f"配置 {lang} 语言字体失败: {e}")
return fonts_config

View File

@ -0,0 +1,210 @@
"""
Linux 系统适配模块
处理 Linux 特有的路径权限环境变量等问题
"""
import os
import sys
import logging
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
class LinuxAdapter:
"""Linux 系统适配器"""
@staticmethod
def is_linux() -> bool:
"""检查是否运行在 Linux 系统上"""
return sys.platform.startswith('linux')
@staticmethod
def normalize_path(path: str) -> Path:
"""
规范化路径 - 适配不同操作系统
Args:
path: 路径字符串可能混合了不同分隔符
Returns:
规范化后的 Path 对象
"""
# 替换反斜杠为正斜杠
path = path.replace('\\', '/')
# 创建 Path 对象,会根据系统自动转换
return Path(path).resolve()
@staticmethod
def ensure_directory_writable(dir_path: Path) -> bool:
"""
确保目录可写
Args:
dir_path: 目录路径
Returns:
是否成功
"""
try:
dir_path = Path(dir_path)
dir_path.mkdir(parents=True, exist_ok=True)
# 检查写入权限
test_file = dir_path / '.test_write'
test_file.touch()
test_file.unlink()
logger.info(f"✓ 目录可写: {dir_path}")
return True
except PermissionError:
logger.error(f"✗ 没有写入权限: {dir_path}")
logger.error(f" 建议: sudo chmod 755 {dir_path}")
return False
except Exception as e:
logger.error(f"✗ 目录检查失败: {dir_path} - {e}")
return False
@staticmethod
def get_recommended_upload_dir() -> Path:
"""
获取 Linux 上推荐的上传目录
Returns:
推荐的上传目录路径
"""
# 优先级:
# 1. 环境变量指定的目录
# 2. 项目相对路径
# 3. /tmp (临时目录)
if upload_dir := os.getenv('UPLOAD_DIR'):
return Path(upload_dir)
project_upload = Path(__file__).parent.parent.parent / 'uploads'
if project_upload.exists() and os.access(project_upload, os.W_OK):
return project_upload
logger.warning("使用系统临时目录进行上传存储")
return Path('/tmp/lazy_fjh_uploads')
@staticmethod
def setup_signal_handlers():
"""
设置 Linux 信号处理器
确保优雅关闭
"""
import signal
def signal_handler(sig, frame):
logger.info(f"收到信号 {sig},开始优雅关闭...")
sys.exit(0)
if LinuxAdapter.is_linux():
signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)
logger.info("✓ Linux 信号处理器已注册")
@staticmethod
def get_process_info() -> dict:
"""
获取当前进程信息
Returns:
进程信息字典
"""
import psutil
process = psutil.Process(os.getpid())
return {
'pid': os.getpid(),
'user': os.getlogin() if LinuxAdapter.is_linux() else 'unknown',
'memory_mb': process.memory_info().rss / 1024 / 1024,
'cpu_percent': process.cpu_percent(interval=1),
'num_threads': process.num_threads()
}
@staticmethod
def check_system_resources() -> dict:
"""
检查系统资源
Returns:
系统资源信息
"""
import psutil
return {
'cpu_count': psutil.cpu_count(),
'total_memory_gb': psutil.virtual_memory().total / (1024**3),
'available_memory_gb': psutil.virtual_memory().available / (1024**3),
'disk_usage_percent': psutil.disk_usage('/').percent
}
@staticmethod
def optimize_for_linux():
"""
针对 Linux 系统进行优化
"""
if not LinuxAdapter.is_linux():
return
logger.info("应用 Linux 系统优化...")
# 1. 增加文件描述符限制
try:
import resource
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (hard, hard))
logger.info(f"✓ 设置文件描述符限制: {hard}")
except Exception as e:
logger.warning(f"⚠ 无法设置文件描述符: {e}")
# 2. 可选内存限制(默认跳过,避免在 WSL/容器中因 RLIMIT_AS 过低触发 MemoryError
mem_limit_env = os.getenv('LINUX_RLIMIT_AS_MB') or os.getenv('LINUX_MEMORY_LIMIT_MB')
if mem_limit_env:
try:
import resource
limit_mb = int(mem_limit_env)
limit_bytes = limit_mb * 1024**2
resource.setrlimit(resource.RLIMIT_AS, (limit_bytes, limit_bytes))
logger.info(f"✓ 设置虚拟内存限制: {limit_mb}MB")
except Exception as e:
logger.warning(f"⚠ 无法设置虚拟内存限制: {e}")
else:
logger.info("跳过虚拟内存限制 (未设置 LINUX_RLIMIT_AS_MB / LINUX_MEMORY_LIMIT_MB)")
# 3. 注册信号处理
LinuxAdapter.setup_signal_handlers()
logger.info("Linux 系统优化完成")
def init_linux_environment():
"""
初始化 Linux 环境
在应用启动时调用
"""
if not LinuxAdapter.is_linux():
logger.info("非 Linux 系统,跳过 Linux 特定初始化")
return
logger.info("=" * 60)
logger.info("初始化 Linux 环境...")
logger.info("=" * 60)
# 应用优化
LinuxAdapter.optimize_for_linux()
# 检查系统资源
resources = LinuxAdapter.check_system_resources()
logger.info(f"系统资源: {resources}")
# 检查上传目录
upload_dir = LinuxAdapter.get_recommended_upload_dir()
if not LinuxAdapter.ensure_directory_writable(upload_dir):
logger.warning(f"上传目录 {upload_dir} 可能不可写")
logger.info("Linux 环境初始化完成")

View File

@ -0,0 +1,159 @@
"""OSS/URL CSV source (v2).
- Validates incoming URL to reduce SSRF risk (allowlist + IP checks)
- Downloads CSV to a local temporary file for analysis
This module is intentionally small and dependency-light.
"""
from __future__ import annotations
import ipaddress
import logging
import os
import socket
import tempfile
from dataclasses import dataclass
from typing import Optional
from urllib.parse import urlsplit
import requests
from app.core.config import settings
logger = logging.getLogger(__name__)
@dataclass(frozen=True)
class DownloadedCsv:
local_path: str
source_host: str
source_name: str
etag: Optional[str] = None
last_modified: Optional[str] = None
class UrlValidationError(ValueError):
pass
def _is_ip_allowed(ip_str: str) -> bool:
ip = ipaddress.ip_address(ip_str)
if settings.V2_ALLOW_PRIVATE_NETWORKS:
return True
# Block loopback/link-local/private/multicast/unspecified/reserved
if (
ip.is_loopback
or ip.is_private
or ip.is_link_local
or ip.is_multicast
or ip.is_unspecified
or ip.is_reserved
):
return False
return True
def validate_source_url(source_url: str) -> tuple[str, str]:
"""Validate URL and return (host, source_name)."""
if not source_url or not isinstance(source_url, str):
raise UrlValidationError("source_url 不能为空")
parts = urlsplit(source_url)
if parts.scheme not in {"https", "http"}:
raise UrlValidationError("仅支持 http/https URL")
if parts.scheme == "http" and not settings.V2_ALLOW_HTTP:
raise UrlValidationError("不允许 http请使用 https 或开启 V2_ALLOW_HTTP")
if not parts.netloc:
raise UrlValidationError("URL 缺少 host")
# Disallow URLs with userinfo
if "@" in parts.netloc:
raise UrlValidationError("URL 不允许包含用户名/密码")
host = parts.hostname
if not host:
raise UrlValidationError("无法解析 URL host")
# Optional allowlist
if settings.V2_ALLOWED_HOSTS:
allowed = {h.lower() for h in settings.V2_ALLOWED_HOSTS}
if host.lower() not in allowed:
raise UrlValidationError(f"host 不在白名单: {host}")
# Resolve host -> IP and block private/loopback, unless explicitly allowed.
try:
addr_info = socket.getaddrinfo(host, None)
except socket.gaierror as e:
raise UrlValidationError(f"DNS 解析失败: {host} ({e})") from e
for family, _type, _proto, _canonname, sockaddr in addr_info:
ip_str = None
if family == socket.AF_INET:
ip_str = str(sockaddr[0])
elif family == socket.AF_INET6:
ip_str = str(sockaddr[0])
if ip_str and not _is_ip_allowed(ip_str):
raise UrlValidationError(f"host 解析到不允许的 IP: {ip_str}")
source_name = os.path.basename(parts.path) or "data.csv"
return host, source_name
def download_csv_to_tempfile(source_url: str, *, suffix: str = ".csv") -> DownloadedCsv:
"""Download URL content to a temp file and return local path + meta."""
host, source_name = validate_source_url(source_url)
# Create temp file inside configured TEMP_DIR for easier ops/observability
settings.TEMP_DIR.mkdir(exist_ok=True)
tmp = tempfile.NamedTemporaryFile(
mode="wb",
suffix=suffix,
dir=str(settings.TEMP_DIR),
delete=False,
)
try:
timeout = (settings.V2_CONNECT_TIMEOUT_SECONDS, settings.V2_DOWNLOAD_TIMEOUT_SECONDS)
with requests.get(source_url, stream=True, timeout=timeout) as resp:
resp.raise_for_status()
etag = resp.headers.get("ETag")
last_modified = resp.headers.get("Last-Modified")
for chunk in resp.iter_content(chunk_size=1024 * 1024):
if not chunk:
continue
tmp.write(chunk)
tmp.flush()
tmp.close()
if os.path.getsize(tmp.name) <= 0:
raise UrlValidationError("下载内容为空")
return DownloadedCsv(
local_path=tmp.name,
source_host=host,
source_name=source_name,
etag=etag,
last_modified=last_modified,
)
except Exception:
try:
tmp.close()
except Exception:
pass
try:
os.unlink(tmp.name)
except Exception:
pass
raise

201
complex_test.csv Normal file
View File

@ -0,0 +1,201 @@
date,sales,ad_cost,temperature
2023-01-01,100.99342830602247,52.28565095475265,25.216717023616898
2023-01-02,107.81824957442487,56.71304741905361,28.151623674857277
2023-01-03,111.52881013862651,61.17966128518964,29.915228586591738
2023-01-04,108.02879336255101,59.283406941450025,29.990188012450005
2023-01-05,96.05238491579978,41.13784561811443,28.44879856043664
2023-01-06,90.99563724567052,40.80869342325965,31.617293515635463
2023-01-07,97.00157358481341,51.075963128450006,26.495631174163773
2023-01-08,103.5773501810333,54.357604845077695,29.22110275096627
2023-01-09,109.08744194176155,57.118959402411015,29.95887684488444
2023-01-10,113.00806452823355,75.76768971739038,31.091055195643584
2023-01-11,105.55591950042415,55.63241230367791,31.632332071452602
2023-01-12,97.09626710914998,54.22596175547798,26.073309905390914
2023-01-13,93.65306223164116,51.59653993328659,24.79464241241625
2023-01-14,91.96068299744252,49.237297755250246,33.179764134037235
2023-01-15,100.63489742855835,48.74110249107745,30.293424447999076
2023-01-16,110.82697548909167,59.20833384701217,27.000771546109235
2023-01-17,111.57901650425853,51.92538217944141,33.849435826065054
2023-01-18,108.60807500637348,53.1199444694867,29.492752546094657
2023-01-19,97.7225257041962,46.43444511295258,32.63336893912616
2023-01-20,92.05768224050016,46.43821181718169,29.247781574883593
2023-01-21,100.66865680262455,61.90762123467982,35.17721864901782
2023-01-22,105.67497634353768,43.5011622088101,34.21074614542007
2023-01-23,114.00518017281185,60.43389103827849,28.14757991637182
2023-01-24,112.42920163028982,48.151021459196656,31.758933958390703
2023-01-25,108.38465543599729,51.83266838905148,30.730097698001675
2023-01-26,101.27527814467474,56.082392057174204,32.84961326556187
2023-01-27,94.30141660796275,47.47210839945869,25.798696954943104
2023-01-28,100.44473732127179,44.833644770989366,30.701370460397328
2023-01-29,106.96636309508162,49.90666300124097,31.768238284669366
2023-01-30,115.19859424833119,60.99728586883897,23.266495108569856
2023-01-31,115.74165722884301,54.218995455835824,24.94268677356046
2023-02-01,114.6690461592577,58.41681602753873,22.32451452199608
2023-02-02,102.54550585069765,51.50061212486789,27.583739295661303
2023-02-03,96.21708851966602,44.85054252180392,30.494335310101455
2023-02-04,103.30012064507876,62.36978076916601,32.79852844272025
2023-02-05,107.76615313512,57.05267167915006,28.463490371410078
2023-02-06,118.1047284831108,48.92665130826737,33.07680141058322
2023-02-07,114.68453024304476,58.27453669536952,24.000399142943266
2023-02-08,109.79663046257247,51.589382907444296,22.980304943241066
2023-02-09,104.48969129818198,56.50701232307211,27.87355790833527
2023-02-10,101.54656654412297,46.81067957989798,29.142146095561642
2023-02-11,103.96500111652492,51.408818350927966,27.841614248180033
2023-02-12,112.01560574417775,58.53273926699116,21.687120936061277
2023-02-13,118.9828573044494,63.820204623075306,27.57183586136113
2023-02-14,117.29813462397733,52.64758527670978,23.87553622210353
2023-02-15,112.4994883398013,54.577237990695906,29.7477111138268
2023-02-16,104.70275042588428,49.97664865713736,28.78823694934582
2023-02-17,103.92903467881405,48.69787117653846,24.818551595791803
2023-02-18,106.28211128125639,61.968326842033676,26.046338946482383
2023-02-19,110.75852018598077,57.40416864779516,24.3600478765442
2023-02-20,122.12422779023196,54.75769412344075,27.299399894110135
2023-02-21,121.12891675425766,65.15376811240272,30.302612891151956
2023-02-22,114.06938782186724,67.64547489599678,24.4297565343602
2023-02-23,108.38021603048709,59.35243431799928,28.848822963638963
2023-02-24,105.62999092642217,45.21814563344102,25.695659305686696
2023-02-25,109.43524939303815,52.29645433218782,24.85756240773558
2023-02-26,114.6422754496737,63.65569347076997,26.864838568377532
2023-02-27,122.74145559432745,57.83238046906982,24.02995142470168
2023-02-28,124.1981979657437,64.31819612360299,25.424479219636847
2023-03-01,118.85647916163501,63.30140984796419,23.44154220163044
2023-03-02,107.73630868962128,49.233501986920224,32.879100021864744
2023-03-03,104.9579248974653,52.18133566842365,27.040464022749358
2023-03-04,107.34286166502932,37.4650941321693,24.785245586575005
2023-03-05,115.96259583523435,52.859359710945725,27.47611058647402
2023-03-06,126.86147718486289,62.167897835465645,26.44693544891746
2023-03-07,127.87752639288169,57.699847286616595,26.070759543108874
2023-03-08,118.24185044528706,67.2829817423017,28.525917185557415
2023-03-09,112.24465279387695,48.971619507135316,28.905688959287644
2023-03-10,107.82181320681322,51.71068416992169,24.991411130032738
2023-03-11,110.2529822211921,55.78019399702651,24.805208594648875
2023-03-12,121.11006570384127,67.76139929725122,25.657256968846575
2023-03-13,130.1816737833995,57.91152613580255,19.526397309813348
2023-03-14,126.71565911008061,69.1736483158151,21.836336361143037
2023-03-15,122.99418850292842,61.548259556562144,30.432281093790863
2023-03-16,106.54633650145081,48.365624995485646,31.21631017567973
2023-03-17,110.51968335255634,57.57035904759452,25.484047660225336
2023-03-18,113.70966939484823,57.85013317529147,27.910575411780364
2023-03-19,121.81927189435518,57.90855156138362,27.06440372996227
2023-03-20,129.15083837967413,64.92442961478716,35.317044435415966
2023-03-21,124.42743772967184,60.287150880527115,29.388875488072575
2023-03-22,120.90336183411937,61.01926764331592,25.596146723045138
2023-03-23,114.05377092727564,60.33753883624305,23.063026919404756
2023-03-24,113.61702757529353,64.73859786837353,21.060058024151907
2023-03-25,114.49586375810298,51.05885438491725,26.439536636244885
2023-03-26,122.827839869664,72.07908680811333,23.5098422365089
2023-03-27,129.81797713472835,55.148549569751665,21.461882087287382
2023-03-28,131.84175921611333,65.16195413287875,23.738673307071416
2023-03-29,123.47701234402419,64.68009220443497,22.383496692674402
2023-03-30,113.83938888817369,58.324653782762006,30.639314352453876
2023-03-31,113.48113806388008,53.62707143283707,28.172557461803127
2023-04-01,117.72767203318188,57.823224764804564,25.453469010723516
2023-04-02,128.40696958856444,61.73848012098806,29.866968095062035
2023-04-03,131.26394087272118,62.685146651639535,25.60898934505341
2023-04-04,130.95724623585764,69.72663360303395,22.742780561844356
2023-04-05,123.51132541540858,63.540740137529525,29.845754141356707
2023-04-06,113.5370478780282,53.30397596271083,26.842860784320308
2023-04-07,114.84818220676686,61.922090480549684,22.064140934005557
2023-04-08,120.06082896926581,61.56691208901595,24.554612106452694
2023-04-09,128.50185694254756,68.31523906546857,22.44852212426784
2023-04-10,134.03773943222305,70.16701392572959,20.876726435247697
2023-04-11,130.37680723958826,61.04342856518377,27.753407014454222
2023-04-12,124.92973736296452,59.66396348049741,30.652873036988282
2023-04-13,117.34977804457614,62.41135704790438,20.67866913783906
2023-04-14,114.46066606640413,60.28218436036939,26.51302831308679
2023-04-15,121.2252388983924,60.508111479375465,22.821941639368188
2023-04-16,131.31856822823775,66.24592103066279,23.262241939158173
2023-04-17,140.11039848581893,76.44352372185159,22.89618506145425
2023-04-18,135.1451770527603,64.61473158220099,22.03114326885
2023-04-19,127.76129781335801,66.6161358125292,24.718429205442522
2023-04-20,119.46355643368209,58.720814954671575,22.029762716093522
2023-04-21,114.0448569861147,55.93402247692124,25.28373228638474
2023-04-22,123.50756339829785,67.24766595908487,24.271396224416407
2023-04-23,132.64644016382874,70.45030182685451,23.655015155883184
2023-04-24,143.0878120259834,75.61145419299488,21.598917054076214
2023-04-25,135.9934038087475,74.5240959401454,22.5410427922146
2023-04-26,129.32437177871165,64.76720509751962,26.48727920511546
2023-04-27,121.12652393740836,63.973026825179005,25.673605834229928
2023-04-28,117.37007501648283,57.13370372527413,21.187937280679726
2023-04-29,127.86250209122531,65.55208280805486,24.368348675081645
2023-04-30,136.04182913809416,67.37019929720866,26.27426187262793
2023-05-01,141.55882902128073,71.26439433560395,18.96163340286704
2023-05-02,136.13522897297034,71.04339961366973,25.549678567089554
2023-05-03,133.0020851904919,62.409939179078584,21.881475456830803
2023-05-04,119.98214368927341,70.453008223064,25.530891483166414
2023-05-05,122.71397649985045,56.326901342426716,21.479066751477976
2023-05-06,131.97731026240027,59.917712067261476,18.303946662830565
2023-05-07,134.56514004748436,73.07312439124252,18.785714394893226
2023-05-08,140.65169760802368,74.28416227382652,23.762345292245453
2023-05-09,139.72310582696744,72.9821519987445,24.347006701144345
2023-05-10,130.66513240891535,68.47429375077908,20.80463806438527
2023-05-11,121.28095236867787,60.57924232010436,25.38311405974921
2023-05-12,123.51795943208879,57.272707858615234,18.4325252403288
2023-05-13,127.494401479568,64.12622353075263,23.16859477491232
2023-05-14,139.49771268252908,66.36304778370399,19.683534315285495
2023-05-15,141.74502392454568,75.74811062936159,21.31082333488498
2023-05-16,144.1875443211837,71.35848525308116,23.358276415959292
2023-05-17,131.58175992050872,61.6633939762918,20.584589049876787
2023-05-18,125.34125538337241,61.06369848342124,21.961911256757762
2023-05-19,126.85611321109774,65.49271387692698,26.084205060809154
2023-05-20,129.18274516107348,61.77274981651687,21.284399768314977
2023-05-21,141.00563068175677,66.39171336304622,25.47190045679844
2023-05-22,147.98975578319556,75.21331394905734,19.525452300348753
2023-05-23,139.4308115573891,70.94023863423816,24.453734141786047
2023-05-24,134.99454013588115,64.96255419108493,27.138776213732495
2023-05-25,128.11503428381528,61.70232561381602,15.34888559509552
2023-05-26,128.6485719341678,65.48453565387207,20.32288207278455
2023-05-27,131.19867759470725,58.358917089867006,24.394532964456197
2023-05-28,139.90565414086922,62.915508198551834,22.003929168504186
2023-05-29,148.20294343801243,70.50925061274404,23.676251690465687
2023-05-30,144.79224153467328,71.3288850087774,20.700607253922893
2023-05-31,136.6043139822718,69.85669481912592,22.72208092020764
2023-06-01,129.90496634749854,72.32926425849703,21.945028595331298
2023-06-02,127.58824815952354,68.08242219577187,25.865155230205552
2023-06-03,136.16761518850672,67.28411494443623,23.074820318848364
2023-06-04,145.1240516092851,72.4669447651291,23.274114518888926
2023-06-05,147.5059196939697,68.7403130237958,20.97542437801451
2023-06-06,149.47687307825956,74.64587085916783,20.697985347883023
2023-06-07,138.53032591367202,67.82186976223531,20.812878200360235
2023-06-08,128.45328941709883,65.84023751023986,23.243657934672576
2023-06-09,132.13221605087253,61.92995330767465,20.747096808795494
2023-06-10,135.78647805542306,70.48997159891739,22.829123565664112
2023-06-11,148.0987107210833,81.71304992555454,28.135750134629784
2023-06-12,153.0193346048766,75.96586656015401,24.47267059270714
2023-06-13,145.6457394335731,74.83142832728126,20.83097462962713
2023-06-14,140.9902449994371,73.94584245827411,25.36243573634108
2023-06-15,133.2924186394212,64.64010696028141,20.484316594503184
2023-06-16,134.34139003578426,68.29115742694421,15.54391785175287
2023-06-17,143.56414528086364,71.8450346443408,18.583781268252814
2023-06-18,148.01551228867737,74.49613663708284,15.945413181646051
2023-06-19,150.95414370388056,71.61202293266295,20.452997236318286
2023-06-20,147.0447580558963,73.64492989924287,21.51254156972946
2023-06-21,138.91442946215307,71.94720618730378,26.436347112705246
2023-06-22,133.9508505306588,74.23114330430461,22.337566040890476
2023-06-23,135.26498811523834,72.42884818804521,20.64923107688999
2023-06-24,142.36042104862796,81.94612281187176,23.744498150585642
2023-06-25,152.13733471736649,72.231929544243,14.572624223730113
2023-06-26,154.23904352473193,81.48112494596936,21.86262256879806
2023-06-27,153.26261953185252,77.54801979461801,23.418123219851857
2023-06-28,141.501240633759,81.69963498296786,16.619517644570024
2023-06-29,141.19092331014159,66.55397022829503,24.436287255248928
2023-06-30,137.72658495620402,64.66468326719813,21.970263091829977
2023-07-01,142.13074316454697,68.06840835505338,19.65865887136292
2023-07-02,150.31262022756084,64.53683149223139,22.752616955102773
2023-07-03,156.92136533513548,75.83190755916394,27.6160986739157
2023-07-04,151.43565463277307,71.92216400861804,21.29936760939659
2023-07-05,144.94522732135877,73.22458259306042,21.448179346840707
2023-07-06,138.3500162874156,70.88378802259359,19.27518363303756
2023-07-07,138.2292024966561,78.49545544440747,18.05348196698251
2023-07-08,144.19080360135797,76.84752099160923,23.04377126872821
2023-07-09,151.39073391880174,72.81084868108886,17.934261085087467
2023-07-10,156.7987408262441,73.90729705638027,20.66696001819084
2023-07-11,155.11785753935226,80.01852462720866,18.969037709955906
2023-07-12,145.4344726620079,66.11607029590074,21.788698271209025
2023-07-13,136.57252976954146,77.4435587140425,21.30249385354929
2023-07-14,140.6277628521662,76.21108202968954,23.363876114180734
2023-07-15,148.69544667183354,72.00184507539325,18.670955828561386
2023-07-16,154.61315597254045,68.74090534081584,19.34112896296411
2023-07-17,159.72655950094892,86.63264162130153,17.16421136521589
2023-07-18,155.0395982759213,76.94709991169756,18.71737147605307
2023-07-19,144.212007347891,78.2950852338128,21.131901479134555
1 date sales ad_cost temperature
2 2023-01-01 100.99342830602247 52.28565095475265 25.216717023616898
3 2023-01-02 107.81824957442487 56.71304741905361 28.151623674857277
4 2023-01-03 111.52881013862651 61.17966128518964 29.915228586591738
5 2023-01-04 108.02879336255101 59.283406941450025 29.990188012450005
6 2023-01-05 96.05238491579978 41.13784561811443 28.44879856043664
7 2023-01-06 90.99563724567052 40.80869342325965 31.617293515635463
8 2023-01-07 97.00157358481341 51.075963128450006 26.495631174163773
9 2023-01-08 103.5773501810333 54.357604845077695 29.22110275096627
10 2023-01-09 109.08744194176155 57.118959402411015 29.95887684488444
11 2023-01-10 113.00806452823355 75.76768971739038 31.091055195643584
12 2023-01-11 105.55591950042415 55.63241230367791 31.632332071452602
13 2023-01-12 97.09626710914998 54.22596175547798 26.073309905390914
14 2023-01-13 93.65306223164116 51.59653993328659 24.79464241241625
15 2023-01-14 91.96068299744252 49.237297755250246 33.179764134037235
16 2023-01-15 100.63489742855835 48.74110249107745 30.293424447999076
17 2023-01-16 110.82697548909167 59.20833384701217 27.000771546109235
18 2023-01-17 111.57901650425853 51.92538217944141 33.849435826065054
19 2023-01-18 108.60807500637348 53.1199444694867 29.492752546094657
20 2023-01-19 97.7225257041962 46.43444511295258 32.63336893912616
21 2023-01-20 92.05768224050016 46.43821181718169 29.247781574883593
22 2023-01-21 100.66865680262455 61.90762123467982 35.17721864901782
23 2023-01-22 105.67497634353768 43.5011622088101 34.21074614542007
24 2023-01-23 114.00518017281185 60.43389103827849 28.14757991637182
25 2023-01-24 112.42920163028982 48.151021459196656 31.758933958390703
26 2023-01-25 108.38465543599729 51.83266838905148 30.730097698001675
27 2023-01-26 101.27527814467474 56.082392057174204 32.84961326556187
28 2023-01-27 94.30141660796275 47.47210839945869 25.798696954943104
29 2023-01-28 100.44473732127179 44.833644770989366 30.701370460397328
30 2023-01-29 106.96636309508162 49.90666300124097 31.768238284669366
31 2023-01-30 115.19859424833119 60.99728586883897 23.266495108569856
32 2023-01-31 115.74165722884301 54.218995455835824 24.94268677356046
33 2023-02-01 114.6690461592577 58.41681602753873 22.32451452199608
34 2023-02-02 102.54550585069765 51.50061212486789 27.583739295661303
35 2023-02-03 96.21708851966602 44.85054252180392 30.494335310101455
36 2023-02-04 103.30012064507876 62.36978076916601 32.79852844272025
37 2023-02-05 107.76615313512 57.05267167915006 28.463490371410078
38 2023-02-06 118.1047284831108 48.92665130826737 33.07680141058322
39 2023-02-07 114.68453024304476 58.27453669536952 24.000399142943266
40 2023-02-08 109.79663046257247 51.589382907444296 22.980304943241066
41 2023-02-09 104.48969129818198 56.50701232307211 27.87355790833527
42 2023-02-10 101.54656654412297 46.81067957989798 29.142146095561642
43 2023-02-11 103.96500111652492 51.408818350927966 27.841614248180033
44 2023-02-12 112.01560574417775 58.53273926699116 21.687120936061277
45 2023-02-13 118.9828573044494 63.820204623075306 27.57183586136113
46 2023-02-14 117.29813462397733 52.64758527670978 23.87553622210353
47 2023-02-15 112.4994883398013 54.577237990695906 29.7477111138268
48 2023-02-16 104.70275042588428 49.97664865713736 28.78823694934582
49 2023-02-17 103.92903467881405 48.69787117653846 24.818551595791803
50 2023-02-18 106.28211128125639 61.968326842033676 26.046338946482383
51 2023-02-19 110.75852018598077 57.40416864779516 24.3600478765442
52 2023-02-20 122.12422779023196 54.75769412344075 27.299399894110135
53 2023-02-21 121.12891675425766 65.15376811240272 30.302612891151956
54 2023-02-22 114.06938782186724 67.64547489599678 24.4297565343602
55 2023-02-23 108.38021603048709 59.35243431799928 28.848822963638963
56 2023-02-24 105.62999092642217 45.21814563344102 25.695659305686696
57 2023-02-25 109.43524939303815 52.29645433218782 24.85756240773558
58 2023-02-26 114.6422754496737 63.65569347076997 26.864838568377532
59 2023-02-27 122.74145559432745 57.83238046906982 24.02995142470168
60 2023-02-28 124.1981979657437 64.31819612360299 25.424479219636847
61 2023-03-01 118.85647916163501 63.30140984796419 23.44154220163044
62 2023-03-02 107.73630868962128 49.233501986920224 32.879100021864744
63 2023-03-03 104.9579248974653 52.18133566842365 27.040464022749358
64 2023-03-04 107.34286166502932 37.4650941321693 24.785245586575005
65 2023-03-05 115.96259583523435 52.859359710945725 27.47611058647402
66 2023-03-06 126.86147718486289 62.167897835465645 26.44693544891746
67 2023-03-07 127.87752639288169 57.699847286616595 26.070759543108874
68 2023-03-08 118.24185044528706 67.2829817423017 28.525917185557415
69 2023-03-09 112.24465279387695 48.971619507135316 28.905688959287644
70 2023-03-10 107.82181320681322 51.71068416992169 24.991411130032738
71 2023-03-11 110.2529822211921 55.78019399702651 24.805208594648875
72 2023-03-12 121.11006570384127 67.76139929725122 25.657256968846575
73 2023-03-13 130.1816737833995 57.91152613580255 19.526397309813348
74 2023-03-14 126.71565911008061 69.1736483158151 21.836336361143037
75 2023-03-15 122.99418850292842 61.548259556562144 30.432281093790863
76 2023-03-16 106.54633650145081 48.365624995485646 31.21631017567973
77 2023-03-17 110.51968335255634 57.57035904759452 25.484047660225336
78 2023-03-18 113.70966939484823 57.85013317529147 27.910575411780364
79 2023-03-19 121.81927189435518 57.90855156138362 27.06440372996227
80 2023-03-20 129.15083837967413 64.92442961478716 35.317044435415966
81 2023-03-21 124.42743772967184 60.287150880527115 29.388875488072575
82 2023-03-22 120.90336183411937 61.01926764331592 25.596146723045138
83 2023-03-23 114.05377092727564 60.33753883624305 23.063026919404756
84 2023-03-24 113.61702757529353 64.73859786837353 21.060058024151907
85 2023-03-25 114.49586375810298 51.05885438491725 26.439536636244885
86 2023-03-26 122.827839869664 72.07908680811333 23.5098422365089
87 2023-03-27 129.81797713472835 55.148549569751665 21.461882087287382
88 2023-03-28 131.84175921611333 65.16195413287875 23.738673307071416
89 2023-03-29 123.47701234402419 64.68009220443497 22.383496692674402
90 2023-03-30 113.83938888817369 58.324653782762006 30.639314352453876
91 2023-03-31 113.48113806388008 53.62707143283707 28.172557461803127
92 2023-04-01 117.72767203318188 57.823224764804564 25.453469010723516
93 2023-04-02 128.40696958856444 61.73848012098806 29.866968095062035
94 2023-04-03 131.26394087272118 62.685146651639535 25.60898934505341
95 2023-04-04 130.95724623585764 69.72663360303395 22.742780561844356
96 2023-04-05 123.51132541540858 63.540740137529525 29.845754141356707
97 2023-04-06 113.5370478780282 53.30397596271083 26.842860784320308
98 2023-04-07 114.84818220676686 61.922090480549684 22.064140934005557
99 2023-04-08 120.06082896926581 61.56691208901595 24.554612106452694
100 2023-04-09 128.50185694254756 68.31523906546857 22.44852212426784
101 2023-04-10 134.03773943222305 70.16701392572959 20.876726435247697
102 2023-04-11 130.37680723958826 61.04342856518377 27.753407014454222
103 2023-04-12 124.92973736296452 59.66396348049741 30.652873036988282
104 2023-04-13 117.34977804457614 62.41135704790438 20.67866913783906
105 2023-04-14 114.46066606640413 60.28218436036939 26.51302831308679
106 2023-04-15 121.2252388983924 60.508111479375465 22.821941639368188
107 2023-04-16 131.31856822823775 66.24592103066279 23.262241939158173
108 2023-04-17 140.11039848581893 76.44352372185159 22.89618506145425
109 2023-04-18 135.1451770527603 64.61473158220099 22.03114326885
110 2023-04-19 127.76129781335801 66.6161358125292 24.718429205442522
111 2023-04-20 119.46355643368209 58.720814954671575 22.029762716093522
112 2023-04-21 114.0448569861147 55.93402247692124 25.28373228638474
113 2023-04-22 123.50756339829785 67.24766595908487 24.271396224416407
114 2023-04-23 132.64644016382874 70.45030182685451 23.655015155883184
115 2023-04-24 143.0878120259834 75.61145419299488 21.598917054076214
116 2023-04-25 135.9934038087475 74.5240959401454 22.5410427922146
117 2023-04-26 129.32437177871165 64.76720509751962 26.48727920511546
118 2023-04-27 121.12652393740836 63.973026825179005 25.673605834229928
119 2023-04-28 117.37007501648283 57.13370372527413 21.187937280679726
120 2023-04-29 127.86250209122531 65.55208280805486 24.368348675081645
121 2023-04-30 136.04182913809416 67.37019929720866 26.27426187262793
122 2023-05-01 141.55882902128073 71.26439433560395 18.96163340286704
123 2023-05-02 136.13522897297034 71.04339961366973 25.549678567089554
124 2023-05-03 133.0020851904919 62.409939179078584 21.881475456830803
125 2023-05-04 119.98214368927341 70.453008223064 25.530891483166414
126 2023-05-05 122.71397649985045 56.326901342426716 21.479066751477976
127 2023-05-06 131.97731026240027 59.917712067261476 18.303946662830565
128 2023-05-07 134.56514004748436 73.07312439124252 18.785714394893226
129 2023-05-08 140.65169760802368 74.28416227382652 23.762345292245453
130 2023-05-09 139.72310582696744 72.9821519987445 24.347006701144345
131 2023-05-10 130.66513240891535 68.47429375077908 20.80463806438527
132 2023-05-11 121.28095236867787 60.57924232010436 25.38311405974921
133 2023-05-12 123.51795943208879 57.272707858615234 18.4325252403288
134 2023-05-13 127.494401479568 64.12622353075263 23.16859477491232
135 2023-05-14 139.49771268252908 66.36304778370399 19.683534315285495
136 2023-05-15 141.74502392454568 75.74811062936159 21.31082333488498
137 2023-05-16 144.1875443211837 71.35848525308116 23.358276415959292
138 2023-05-17 131.58175992050872 61.6633939762918 20.584589049876787
139 2023-05-18 125.34125538337241 61.06369848342124 21.961911256757762
140 2023-05-19 126.85611321109774 65.49271387692698 26.084205060809154
141 2023-05-20 129.18274516107348 61.77274981651687 21.284399768314977
142 2023-05-21 141.00563068175677 66.39171336304622 25.47190045679844
143 2023-05-22 147.98975578319556 75.21331394905734 19.525452300348753
144 2023-05-23 139.4308115573891 70.94023863423816 24.453734141786047
145 2023-05-24 134.99454013588115 64.96255419108493 27.138776213732495
146 2023-05-25 128.11503428381528 61.70232561381602 15.34888559509552
147 2023-05-26 128.6485719341678 65.48453565387207 20.32288207278455
148 2023-05-27 131.19867759470725 58.358917089867006 24.394532964456197
149 2023-05-28 139.90565414086922 62.915508198551834 22.003929168504186
150 2023-05-29 148.20294343801243 70.50925061274404 23.676251690465687
151 2023-05-30 144.79224153467328 71.3288850087774 20.700607253922893
152 2023-05-31 136.6043139822718 69.85669481912592 22.72208092020764
153 2023-06-01 129.90496634749854 72.32926425849703 21.945028595331298
154 2023-06-02 127.58824815952354 68.08242219577187 25.865155230205552
155 2023-06-03 136.16761518850672 67.28411494443623 23.074820318848364
156 2023-06-04 145.1240516092851 72.4669447651291 23.274114518888926
157 2023-06-05 147.5059196939697 68.7403130237958 20.97542437801451
158 2023-06-06 149.47687307825956 74.64587085916783 20.697985347883023
159 2023-06-07 138.53032591367202 67.82186976223531 20.812878200360235
160 2023-06-08 128.45328941709883 65.84023751023986 23.243657934672576
161 2023-06-09 132.13221605087253 61.92995330767465 20.747096808795494
162 2023-06-10 135.78647805542306 70.48997159891739 22.829123565664112
163 2023-06-11 148.0987107210833 81.71304992555454 28.135750134629784
164 2023-06-12 153.0193346048766 75.96586656015401 24.47267059270714
165 2023-06-13 145.6457394335731 74.83142832728126 20.83097462962713
166 2023-06-14 140.9902449994371 73.94584245827411 25.36243573634108
167 2023-06-15 133.2924186394212 64.64010696028141 20.484316594503184
168 2023-06-16 134.34139003578426 68.29115742694421 15.54391785175287
169 2023-06-17 143.56414528086364 71.8450346443408 18.583781268252814
170 2023-06-18 148.01551228867737 74.49613663708284 15.945413181646051
171 2023-06-19 150.95414370388056 71.61202293266295 20.452997236318286
172 2023-06-20 147.0447580558963 73.64492989924287 21.51254156972946
173 2023-06-21 138.91442946215307 71.94720618730378 26.436347112705246
174 2023-06-22 133.9508505306588 74.23114330430461 22.337566040890476
175 2023-06-23 135.26498811523834 72.42884818804521 20.64923107688999
176 2023-06-24 142.36042104862796 81.94612281187176 23.744498150585642
177 2023-06-25 152.13733471736649 72.231929544243 14.572624223730113
178 2023-06-26 154.23904352473193 81.48112494596936 21.86262256879806
179 2023-06-27 153.26261953185252 77.54801979461801 23.418123219851857
180 2023-06-28 141.501240633759 81.69963498296786 16.619517644570024
181 2023-06-29 141.19092331014159 66.55397022829503 24.436287255248928
182 2023-06-30 137.72658495620402 64.66468326719813 21.970263091829977
183 2023-07-01 142.13074316454697 68.06840835505338 19.65865887136292
184 2023-07-02 150.31262022756084 64.53683149223139 22.752616955102773
185 2023-07-03 156.92136533513548 75.83190755916394 27.6160986739157
186 2023-07-04 151.43565463277307 71.92216400861804 21.29936760939659
187 2023-07-05 144.94522732135877 73.22458259306042 21.448179346840707
188 2023-07-06 138.3500162874156 70.88378802259359 19.27518363303756
189 2023-07-07 138.2292024966561 78.49545544440747 18.05348196698251
190 2023-07-08 144.19080360135797 76.84752099160923 23.04377126872821
191 2023-07-09 151.39073391880174 72.81084868108886 17.934261085087467
192 2023-07-10 156.7987408262441 73.90729705638027 20.66696001819084
193 2023-07-11 155.11785753935226 80.01852462720866 18.969037709955906
194 2023-07-12 145.4344726620079 66.11607029590074 21.788698271209025
195 2023-07-13 136.57252976954146 77.4435587140425 21.30249385354929
196 2023-07-14 140.6277628521662 76.21108202968954 23.363876114180734
197 2023-07-15 148.69544667183354 72.00184507539325 18.670955828561386
198 2023-07-16 154.61315597254045 68.74090534081584 19.34112896296411
199 2023-07-17 159.72655950094892 86.63264162130153 17.16421136521589
200 2023-07-18 155.0395982759213 76.94709991169756 18.71737147605307
201 2023-07-19 144.212007347891 78.2950852338128 21.131901479134555

38
docker-compose.yml Normal file
View File

@ -0,0 +1,38 @@
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
container_name: lazy-fjh-app
ports:
- "60201:60201"
volumes:
- ./uploads:/app/uploads
- ./logs:/app/logs
- ./temp:/app/temp
env_file:
- .env
environment:
- ENV=production
- DEBUG=False
- HOST=0.0.0.0
- PORT=60201
- LOG_LEVEL=info
- LANGUAGE_DEFAULT=zh
- ANALYSIS_TIMEOUT=300
- MAX_MEMORY_MB=500
restart: always
networks:
- app-network
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:60201/health" ]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
networks:
app-network:
driver: bridge

26
docs/V1.1.md Normal file
View File

@ -0,0 +1,26 @@
# 会话总结V1.1
## 近期问题与修复
- **内存限制**:发现 uvicorn 启动阶段因 RLIMIT_AS 2GB 触发 `MemoryError`。将内存限制改为可选,通过环境变量控制(默认不设限)。
- **pandas 频率大小写**pandas 3.x 需小写频率,已将 `freq='H'/'S'` 改为 `freq='h'/'s'`,解决 `Invalid frequency` 报错。
## 接口现状(简)
- v1 `/api`: `upload`、`analyze`、`available_methods`、`image/{filename}`、`download/{filename}`、`list_uploads` 均已实现。
- v2 `/api/v2`: `analyze`OSS/URL 输入)、`available_methods` 已实现;`API_MODE=v2` 可禁用 v1 上传/图片接口。
## 设计决策:前端渲染、数据模式
- 不再传 PNG后端只返回结构化数据前端用 ECharts 渲染。方案详见 `docs/charts-data-mode-plan.md`
- 统一清洗:`to_echarts_safe` 处理 NaN/Inf/pd.NA、Timestamp → ISO8601、numpy/Decimal 转原生,防循环引用。
- 数据格式约定:
- 时间序列/多系列:优先 `dataset`;矩阵(相关性等)提前 flatten `[i,j,value]`
- 直方图:后端 `np.histogram` 分箱,返回 `[range_start, range_end, count]`
- 样式解耦:后端不返回颜色/线型。
- 算法保持不变,改动仅在结果封装/清洗;如需 CI/异常标注属于额外封装,不改核心算法。
## 待办(尚未落地)
- 落实 charts 数据模式:实现 `to_echarts_safe`、新增 `charts` 字段、禁用图片保存时的返回路径。
- 直方图分箱数据、异常点标注(若需要)、预测上下界(若需要)在封装层返回。
## 参考
- 详细方案:`docs/charts-data-mode-plan.md`
- 接口清单:`docs/api-endpoints-status.md`

View File

@ -0,0 +1,21 @@
# API 接口清单与完成度
## v1 路由(前缀 `/api`
| 接口 | 方法 | 说明 | 状态 | 备注 |
| --- | --- | --- | --- | --- |
| `/api/upload` | POST | 上传 CSV 文件 | ✅ 已实现 | 仅接受 `settings.ALLOWED_EXTENSIONS`(默认 csv。返回保存后的文件名。 |
| `/api/analyze` | POST | 对上传的 CSV 做完整分析 | ✅ 已实现 | 返回分析结果;已切换 charts 数据模式(`analysis.<lang>.charts``images` 保留为空以兼容旧前端。 |
| `/api/available_methods` | GET | 列出支持的分析方法 | ✅ 已实现 | 静态列表。 |
| `/api/image/{filename}` | GET | 获取图片文件 | ✅ 已实现 | 从 `uploads/` 读取。 |
| `/api/download/{filename}` | GET | 下载文件 | ✅ 已实现 | 从 `uploads/` 读取。 |
| `/api/list_uploads` | GET | 列出上传文件 | ✅ 已实现 | 返回文件名/大小/修改时间。 |
## v2 路由(前缀 `/api/v2`
| 接口 | 方法 | 说明 | 状态 | 备注 |
| --- | --- | --- | --- | --- |
| `/api/v2/analyze` | POST | 从 OSS/URL 下载 CSV 并分析 | ✅ 已实现 | 复用 v1 分析器;已返回 charts 数据模式,`images` 为空。`API_MODE=v2` 下仍禁用图片。 |
| `/api/v2/available_methods` | GET | 列出支持的分析方法 | ✅ 已实现 | 与 v1 相同。 |
## 已知差距 / 待办(尚未实现)
- **预测置信区间**:当前 VAR 仅返回点预测;如需 CI 需改用 `forecast_interval`(不改算法,只取上下界)。
- **异常点标注**暂无标注输出若需要需要在封装层额外计算max/min 或异常检测)。

View File

@ -0,0 +1,71 @@
# charts 数据模式(现行版)
> 旧版文档已存档为 `docs/旧的charts-data-mode-plan.md`
## 目标
- 后端返回结构化图表数据,前端用 ECharts 渲染;不再生成/传输图片。
- 统一清洗,避免 NaN/Inf/不可序列化对象导致接口崩溃。
- 响应结构以 `analysis.<lang>.charts` 为准;`images` 为空仅用于兼容旧前端。
## 序列化规范to_echarts_safe 已实现)
- NaN/Inf/pd.NA → nullnumpy 标量转原生类型Decimal 转 float。
- Timestamp/datetime → ISO8601 字符串。
- ndarray/DataFrame/Series → list 或 records递归清洗并防循环引用。
## 响应骨架(实际线上形态)
```json
{
"success": true,
"meta": { ... },
"analysis": {
"zh": {
"data_description": "...",
"preprocessing_steps": [...],
"api_analysis": { ... },
"steps": [
{ "key": "ts_img", "title": "Time Series Analysis", "chart": "ts", "summary": "..." },
...
],
"charts": {
"ts": { ... },
"acf_pacf": { ... },
...
}
}
},
"images": {},
"log": [...]
}
```
- 顶层不再单独返回 `charts`;前端应读取 `analysis.<lang>.charts`。如需顶层别名,可在路由层追加映射。
- `steps[].chart` 指向 `charts` 中的 key驱动前端展示顺序。
## 图表数据格式(按现实现状)
- **时间序列 ts**`type: line``dataset = [[col...], [...]]`,含 timestamp 字符串。
- **ACF/PACF acf_pacf**`series: [{name: col, acf:[{lag,value}], pacf:[{lag,value}]}]`,每个列打包在同一项。
- **平稳性 stationarity**`records: [{column, ADF:{...}, KPSS:{...}}]`。
- **正态性 normality**`records: [{column, histogram:[{range_start,range_end,count},...], Shapiro-Wilk:{...}, Jarque-Bera:{...}}]`。
- **季节分解 seasonal**`type: line``dataset` 包含 observed/trend/seasonal/resid缺失为 null。
- **频谱 spectral已做摘要以控体积**
- `spectrogram`: `f`, `t`, `Sxx_log10_mean`, `Sxx_shape`;不返回完整矩阵。
- `periodogram`: `f``Pxx_den` 仅前 20 个点。
- **相关性 heatmap**`type: heatmap``data` 为 `[i,j,value]` 扁平列表,含 xLabels/yLabels。
- **PCA 碎石 pca_scree**`type: bar``dataset` 组件/解释度/累积值。
- **PCA 散点 pca_scatter**`type: scatter``records` 含 PC1/PC2/timestamp。
- **特征重要性 feature_importance**`type: bar``records` 含 feature/importance。
- **聚类 cluster**`type: scatter``records` 含 cluster 与 timestamp。
- **因子分析 factor**`type: scatter``records` 含 Factor1/Factor2 与 timestamp。
- **协整 cointegration**`type: table``meta` 直接承载 trace_stat/crit_vals/eigen_vals。
- **VAR 预测 var_forecast**`type: line``dataset` 含 step 与各 forecast 列。
## 兼容与注意
- `images` 为空对象;任何遗留的 `image_path` 已剔除。
- 当前频谱输出为“摘要版”,若要还原全量矩阵需调整 `perform_spectral_analysis`
- ACF/PACF 结构与旧文档不同,前端需按现状解码;若要拆分 series可在后端调整 `_build_chart_payload`
- 正态性直方图已由后端分箱,无需前端再分箱。
## 已知可选改进
1) 路由层增加顶层 `charts` 别名,便于前端迁移。
2) ACF/PACF 改为每列拆两条 seriesacf/pacf与旧示例一致。
3) 为 spectral 增加 `mode=full|summary` 开关,前端可选取全量或摘要。
4) 对大体积 dataset 增加可选抽样/截断策略。

View File

@ -0,0 +1,32 @@
# charts 模式实现说明(现行版)
> 旧版实现在 `docs/旧的关于charts模式的实现.md`
## 现状概览
- 后端强制 `generate_plots=False`,所有步骤只产出数据,`analysis.<lang>.charts` 收口。
- `images` 为空对象,保留兼容;`steps[].chart` 绑定对应图表 key。
- 清洗函数 `to_echarts_safe` 递归处理 NaN/Inf/Timestamp/numpy/Decimal确保 JSON-safe。
## 关键结构
- 响应:`analysis.<lang>.charts`(顶层未暴露 charts
- 时间序列、季节分解、VAR 等使用 dataset相关性使用扁平 heatmapPCA/聚类/因子用 records。
- ACF/PACF每个列含 `acf`/`pacf` 两个序列(与旧文档拆成两条 series 不同)。
- 正态性:每列包含 histogram 分箱(后端 `np.histogram`),加 Shapiro/JB 结果。
- 频谱当前为摘要版spectrogram 只给均值+shapeperiodogram 仅前 20 点)。
## 文件与代码映射
- 清洗与汇总:`app/services/analysis_system.py``to_echarts_safe`、`_build_chart_payload`、`run_analysis`)。
- 时序数据:`app/services/analysis/modules/time_series.py`(数据-only频谱摘要版
- 正态性分箱:`app/services/analysis/modules/basic.py`。
- 路由返回:`app/api/routes/analysis.py`、`app/api/routes/analysis_v2.py``charts` 位于 `analysis.<lang>`)。
## 与旧版差异
- 不再生成图片;顶层不提供 `charts` 字段。
- ACF/PACF 结构改变;频谱从全量矩阵切换为摘要版。
- 正态性直方图格式为字典字段而非二维数组。
## 后续可选改进
1) 路由层增加顶层 `charts` 别名,便于前端无感迁移。
2) ACF/PACF 输出可改为拆分 series与旧版示例一致
3) 频谱提供 `full/summary` 开关,允许返回完整矩阵或摘要。
4) 为大数据集增加抽样/截断策略,防止超大 payload。

View File

@ -0,0 +1,125 @@
# 前后端分离的可视化数据返回方案ECharts
## 目标
- 后端不再生成/传输图片,仅返回图表数据;前端使用 ECharts 渲染。
- 统一的数据结构,减少前端适配代码;杜绝 NaN/Infinity/不可序列化对象导致的 API 崩溃。
- 与现有 API 保持兼容(`images` 可留空),逐步迁移到 `charts` 数据模式。
## 序列化与清洗规范(必须遵守)
- **NaN / Infinity / pd.NA**:递归清洗为 `null`JSON `null`)。不得返回字符串 "NaN"。
- **时间戳**:统一 ISO8601 字符串(例 `2023-01-01T12:00:00`)。
- **数组/矩阵**:全部转为原生 Python list再 JSON 序列化。
- **DataFrame**:优先 `to_dict(orient="records")` 或组装 dataset 形式(见下)。
- **数值类型**numpy 标量转原生 `int/float/bool`;遇到 `nan/inf` 先清洗。
> 建议实现一个通用函数 `to_echarts_safe(obj)`,递归处理上述清洗与类型转换,所有响应数据出站前统一走这一层。
## 响应骨架(新增 `charts`,旧字段保持)
```json
{
"success": true,
"meta": { ... },
"analysis": {
"zh": {
"data_description": "...",
"preprocessing_steps": [ ... ],
"api_analysis": { ... },
"steps": [
{"key": "ts", "title": "Time Series", "summary": "...", "chart": "ts"},
...
]
}
},
"charts": {
"ts": { "type": "line", "dataset": [...], "meta": {...} },
"acf_pacf": { "type": "bar", "series": [...], "meta": {...} },
"heatmap": { "type": "heatmap", "data": [...], "xLabels": [...], "yLabels": [...], "meta": {...} },
...
},
"images": {},
"log": [...]
}
```
- `steps[].chart` 指向 `charts` 的 key前端可按步骤顺序渲染。
- `images` 保留但为空,兼容旧前端。
## 各图表建议的数据格式(贴合 ECharts
- **时间序列ts**`dataset` 形式
- 二维数组:首行表头,例如 `[ ["timestamp","sales","ad_cost"], ["2023-01-01T00:00:00", 10, 5], ... ]`
- 前端:`dataset.source = dataset``series: [{type:'line', encode:{x:'timestamp', y:'sales'}}, ...]`
- **ACF / PACFacf_pacf**
- `{ series: [{name:'acf', data:[{lag:0, value:1.0}, ...]}, {name:'pacf', data:[...] }], meta:{column:'sales'} }`
- **平稳性检验stationarity**
- `{ adf: {statistic:..., p_value:..., critical_values:{...}}, kpss:{...}, meta:{column:'sales'} }`
- 前端可渲染 bar/表格。
- **正态性检验normality**
- `{ columns: [{name:'col', shapiro_p:..., jb_p:..., shapiro_stat:..., jb_stat:...}], meta:{} }`
- **季节性分解seasonal**
- `dataset` 形式:`[["timestamp","observed","trend","seasonal","resid"], [...]]`
- **频谱分析spectral**
- `periodogram`: `{ f: [...], psd: [...] }`(可截断前 N 点)
- `spectrogram`: `{ f: [...], t: [...], values: [[i,j,val], ...] }`(可只返回 log10 后再压缩)
- **相关性热力图heatmap**
- `{ data: [[i,j,value], ...], xLabels:[...], yLabels:[...], meta:{} }`(后端提前 flatten N×N 矩阵)
- **PCA 碎石图pca_scree**
- `dataset`: `[["component","explained","cumulative"], [1,0.4,0.4], ...]`
- **PCA 散点pca_scatter**
- `records`: `[{pc1:..., pc2:..., timestamp:"..."}, ...]`
- **特征重要性feature_importance**
- `records`: `[{feature:"...", importance:0.12}, ...]`
- **聚类cluster**
- `records`: `[{timestamp:"...", cluster:0, x:<可选>, y:<可选>}...]`
- **因子分析factor**
- 类似聚类:`[{timestamp:"...", factor1:..., factor2:...}]`
- **协整检验cointegration**
- `{ trace_stat:[...], crit_95:[...], eigen_vals:[...], meta:{} }`
- **VAR 预测var_forecast**
- `dataset`: `[["step","var1_forecast","var2_forecast"], [1, ...], ...]`
> 原则:能用 `dataset` 就用 `dataset`,多条线在前端通过 `encode` 指定;需要矩阵的提前 flatten其余用 records。
## 样式与主题
- 后端不返回颜色、线型等视觉样式仅返回语义字段series 名称、指标含义)。前端根据主题决定配色与风格。
## 实施步骤(建议)
1) 增加 `to_echarts_safe` 清洗函数,统一处理 NaN/Infinity/Timestamp/DataFrame -> JSON-safe。
2) 在各分析函数里:保留计算逻辑,改为组装 chart datadataset/records/flatten不再生成 PNG`generate_plots` 逻辑可留作开关,但默认 False。
3) `run_analysis` 汇总时,将各 step 的数据填入 `charts`,在 `steps` 内写入 `chart` key引用图表
4) 路由层返回 `charts` 字段,`images` 留空,`steps` 仍返回。
5) 前端按 `charts` 协议接入 ECharts去掉对 `images` 的依赖。
## 兼容与回退
- 旧前端:仍可拿到 `analysis.steps``images`(为空)。
- 新前端:使用 `charts`。如果某一步失败,返回 `{error:"..."}` 和简短 summary避免 500。
## 性能注意
- 后端不画图CPU/IO 显著下降;如需进一步优化,可让前端传 `methods` 列表决定执行哪些步骤。
## 算法是否需要改动?
- 核心统计/时序算法ADF/KPSS、ACF/PACF、PCA、VAR、季节分解、相关矩阵、聚类等保持不变改动集中在“结果封装”层。
- 需要调整的只是输出包装:
- 将现有用于绘图的中间结果DataFrame/ndarray/statsmodels 结果)转换为 ECharts 友好的 JSON 结构,统一经过 `to_echarts_safe` 清洗NaN/Inf/Timestamp
- 矩阵类结果(如相关性)在后端提前 flatten 成 `[i,j,value]` 列表dataset 形式优先用于多系列折线/柱状。
- 可按需做截断/摘要以控体积(如 periodogram 取前 N 点spectrogram 取均值或下采样)。
- 补充元信息(列名/单位/变量名),方便前端生成 legend/tooltip。
- 不需改动的部分:
- 预处理、标准化流程、算法的数学实现与参数选择滞后阶、分解周期、PCA 组件数等)保持现状。
- 如后续发现数据量过大或性能瓶颈,可再对个别步骤做抽样/截断,但不影响算法正确性。
## 追加约定(仍然不改算法,只改结果包装)
- **直方图分箱**:正态性/分布分析中,后端负责 binning`np.histogram`),返回 `[["range_start","range_end","count"], ...]`。前端不做分箱。
- **to_echarts_safe 扩展**:除 NaN/Inf/Timestamp 外,显式处理 numpy 各数值类型、Decimal必要时加“已访问集合”防循环引用。统一输出 JSON-safe、ECharts-friendly 结构。
- **矩阵/多系列格式**:矩阵类(相关性等)继续 flatten `[i,j,value]`;多系列/多列数据优先用 dataset+encode保证对齐。

View File

@ -0,0 +1,45 @@
# 关于 charts 模式的实现
## 目标与范围
- 按 `docs/charts-data-mode-plan.md` 将后端改为返回结构化图表数据ECharts 友好),不再生成/返回图片。
- 保持算法与分析流程不变,仅调整封装与响应结构;旧前端通过空的 `images` 字段保持兼容。
## 核心实现
- **统一清洗函数**:在 `app/services/analysis_system.py` 增加 `to_echarts_safe`,递归处理 NaN/Inf/pd.NA、numpy 标量/数组、Timestamp/datetime、Decimal带循环引用保护输出 JSON-safe 结构。
- **分析流程改造**
- 在 `run_analysis` 内强制 `generate_plots=False`,改用 `charts` 收集每步结果,`steps[].chart` 指向对应 key。
- 为每个步骤新增 `chart_key`,映射到 `charts`
- `stats`(统计概览 dataset 表格)、`ts`(时间序列 dataset、`acf_pacf`acf/pacf 序列)、`stationarity`、`normality`(表格)、`seasonal`、`spectral`、`heatmap`(相关矩阵 flatten、`pca_scree`、`pca_scatter`、`feature_importance`、`cluster`、`factor`records、`cointegration`(表格 meta、`var_forecast`forecast dataset含 step 列)。
- `_build_chart_payload` 依据 chart_key 组装 ECharts 友好的 dataset/records/flatten 结构,并通过 `to_echarts_safe` 清洗。
- 移除 fallback 图片生成,仅保留文字 fallback 分析。
- **数据层改动**
- 正态性检验在 `app/services/analysis/modules/basic.py` 内增加直方图分箱:`np.histogram` 返回 `[range_start, range_end, count]` 列表,便于前端直接渲染。
## 路由响应调整
- v1 `POST /api/analyze` 与 v2 `POST /api/v2/analyze`
- `analysis.<lang>.charts` 返回各图表数据;`steps` 保留顺序与摘要,并携带 `chart` 引用。
- `images` 始终为空对象,仅为兼容旧前端;删除旧的图片复制/保存逻辑,并剔除 `image_path` 泄露。
## 兼容性与注意事项
- 核心算法、预处理、API 分析调用保持原样;仅输出封装变化。
- 如果前端仍使用旧版,需要改为读取 `analysis.<lang>.charts``steps[].chart`。旧字段images为空不会报错。
- 大型数据仍需关注内存占用;如需进一步压缩,可在 `_build_chart_payload` 中添加截断/抽样。
## 相关文件
- 实现细节:`app/services/analysis_system.py`
- 直方图分箱:`app/services/analysis/modules/basic.py`
- 路由返回:`app/api/routes/analysis.py`、`app/api/routes/analysis_v2.py`
- 设计说明:`docs/charts-data-mode-plan.md`
## 2026-01-29 补充
- ACF/PACF 输出改为按 `lag/value` 的 records便于前端直接做 bar/line 映射。
- 频谱输出:
- `spectrogram` 增加降采样并返回 `values: [i,j,val]`,附 `f`、`t` 列表。
- `periodogram` 返回 dataset 形式 `["f","psd"]`(截断前 200 点)。
- `docs/api-endpoints-status.md` 已更新状态,标记 charts 模式落地,`images` 为空仅兼容。
## 2026-01-29 后续调整
- 应需求取消频谱降采样:`spectrogram` 现返回全量 `f/t` 与全部 `values[i,j,val]``periodogram` 返回全量频点 dataset可能显著增大 payload如需再控体积可重新引入上限或抽样
## 2026-01-29 再次更新
- time_series 模块回归“只返数据不生成图片”时间序列、ACF/PACF、季节分解、频谱均不再绘图直接返回 charts 所需数据;频谱依旧不降采样,返回全量值。

28
generate_openapi.py Normal file
View File

@ -0,0 +1,28 @@
import os
import sys
import json
from pathlib import Path
# Add project root to path to ensure imports work
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
try:
from app.main import app
print("Successfully imported FastAPI app.")
except ImportError as e:
print(f"Error importing app: {e}")
sys.exit(1)
def generate_openapi_json():
openapi_schema = app.openapi()
output_path = Path("openapi.json")
with open(output_path, "w", encoding="utf-8") as f:
json.dump(openapi_schema, f, indent=2, ensure_ascii=False)
print(f"OpenAPI documentation generated at: {output_path.absolute()}")
if __name__ == "__main__":
generate_openapi_json()

29
generate_test_data.py Normal file
View File

@ -0,0 +1,29 @@
import pandas as pd
import numpy as np
# 设置随机种子
np.random.seed(42)
# 生成 200 天的时间序列
dates = pd.date_range(start='2023-01-01', periods=200, freq='D')
# 构造数据
trend = np.linspace(0, 50, 200)
seasonality = 10 * np.sin(np.linspace(0, 3.14 * 2 * (200/7), 200))
noise = np.random.normal(0, 2, 200)
sales = 100 + trend + seasonality + noise
ad_cost = sales * 0.5 + np.random.normal(0, 5, 200)
temperature = 30 - trend * 0.2 + np.random.normal(0, 3, 200)
# 创建 DataFrame
df = pd.DataFrame({
'date': dates,
'sales': sales,
'ad_cost': ad_cost,
'temperature': temperature
})
# 保存
df.to_csv('complex_test.csv', index=False)
print("✅ 成功生成测试文件: complex_test.csv")

552
openapi.json Normal file
View File

@ -0,0 +1,552 @@
{
"openapi": "3.1.0",
"info": {
"title": "时间序列数据分析系统",
"description": "支持多格式数据上传、AI增强分析、多语言报告生成",
"version": "2.0.0"
},
"paths": {
"/api/upload": {
"post": {
"tags": [
"upload"
],
"summary": "上传CSV或图片文件",
"description": "上传数据文件CSV 或图片)\n\n- **file**: CSV 或图片文件 (PNG, JPG, BMP, TIFF)\n- **task_description**: 分析任务描述",
"operationId": "upload_file_api_upload_post",
"requestBody": {
"content": {
"multipart/form-data": {
"schema": {
"$ref": "#/components/schemas/Body_upload_file_api_upload_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/UploadResponse"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/api/available_methods": {
"get": {
"tags": [
"analysis"
],
"summary": "获取可用的分析方法",
"description": "获取所有可用的分析方法",
"operationId": "get_available_methods_api_available_methods_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"additionalProperties": true,
"type": "object",
"title": "Response Get Available Methods Api Available Methods Get"
}
}
}
}
}
}
},
"/api/analyze": {
"post": {
"tags": [
"analysis"
],
"summary": "执行完整分析",
"description": "执行完整的时间序列分析\n\n流程:\n1. 加载并预处理数据\n2. 执行15种分析方法\n3. 调用AI API 进行深度分析\n4. 生成PDF/PPT/HTML报告",
"operationId": "analyze_data_api_analyze_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/AnalysisRequest"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"additionalProperties": true,
"type": "object",
"title": "Response Analyze Data Api Analyze Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/api/v2/available_methods": {
"get": {
"tags": [
"analysis-v2"
],
"summary": "获取可用的分析方法v2",
"description": "v2 版本:返回与 v1 相同的可用分析方法列表。",
"operationId": "get_available_methods_v2_api_v2_available_methods_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"additionalProperties": true,
"type": "object",
"title": "Response Get Available Methods V2 Api V2 Available Methods Get"
}
}
}
}
}
}
},
"/api/v2/analyze": {
"post": {
"tags": [
"analysis-v2"
],
"summary": "执行完整分析v2从 OSS URL 读取 CSV",
"description": "Analyze CSV from an OSS/URL, returning the same structure as v1.",
"operationId": "analyze_data_v2_api_v2_analyze_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/AnalysisV2Request"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"additionalProperties": true,
"type": "object",
"title": "Response Analyze Data V2 Api V2 Analyze Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/api/image/{filename}": {
"get": {
"tags": [
"files"
],
"summary": "获取图片文件",
"description": "获取可视化图片文件",
"operationId": "serve_image_api_image__filename__get",
"parameters": [
{
"name": "filename",
"in": "path",
"required": true,
"schema": {
"type": "string",
"title": "Filename"
}
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/api/download/{filename}": {
"get": {
"tags": [
"files"
],
"summary": "下载文件",
"description": "下载报告或其他文件",
"operationId": "download_file_api_download__filename__get",
"parameters": [
{
"name": "filename",
"in": "path",
"required": true,
"schema": {
"type": "string",
"title": "Filename"
}
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/api/list_uploads": {
"get": {
"tags": [
"files"
],
"summary": "列出上传的文件",
"description": "列出 uploads 目录中的文件",
"operationId": "list_uploads_api_list_uploads_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/": {
"get": {
"summary": "Root",
"description": "根路径",
"operationId": "root__get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/health": {
"get": {
"summary": "Health",
"description": "健康检查",
"operationId": "health_health_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/api/config": {
"get": {
"summary": "Get Config",
"description": "获取应用配置",
"operationId": "get_config_api_config_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
}
},
"components": {
"schemas": {
"AnalysisRequest": {
"properties": {
"filename": {
"type": "string",
"title": "Filename"
},
"file_type": {
"type": "string",
"title": "File Type",
"default": "csv"
},
"task_description": {
"type": "string",
"title": "Task Description",
"default": "时间序列数据分析"
},
"data_background": {
"additionalProperties": true,
"type": "object",
"title": "Data Background",
"default": {}
},
"original_image": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Original Image"
},
"language": {
"type": "string",
"title": "Language",
"default": "zh"
},
"generate_plots": {
"type": "boolean",
"title": "Generate Plots",
"default": false
}
},
"type": "object",
"required": [
"filename"
],
"title": "AnalysisRequest",
"description": "分析请求模型"
},
"AnalysisV2Request": {
"properties": {
"oss_url": {
"type": "string",
"title": "Oss Url"
},
"task_description": {
"type": "string",
"title": "Task Description",
"default": "时间序列数据分析"
},
"data_background": {
"additionalProperties": true,
"type": "object",
"title": "Data Background",
"default": {}
},
"language": {
"type": "string",
"title": "Language",
"default": "zh"
},
"generate_plots": {
"type": "boolean",
"title": "Generate Plots",
"default": false
},
"source_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Source Name"
}
},
"type": "object",
"required": [
"oss_url"
],
"title": "AnalysisV2Request",
"description": "v2 分析请求模型(输入为 OSS/URL"
},
"Body_upload_file_api_upload_post": {
"properties": {
"file": {
"type": "string",
"format": "binary",
"title": "File"
},
"task_description": {
"type": "string",
"title": "Task Description",
"default": "时间序列数据分析"
}
},
"type": "object",
"required": [
"file"
],
"title": "Body_upload_file_api_upload_post"
},
"HTTPValidationError": {
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"type": "array",
"title": "Detail"
}
},
"type": "object",
"title": "HTTPValidationError"
},
"UploadResponse": {
"properties": {
"success": {
"type": "boolean",
"title": "Success"
},
"filename": {
"type": "string",
"title": "Filename"
},
"file_type": {
"type": "string",
"title": "File Type"
},
"original_filename": {
"type": "string",
"title": "Original Filename"
},
"task_description": {
"type": "string",
"title": "Task Description"
},
"message": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Message"
}
},
"type": "object",
"required": [
"success",
"filename",
"file_type",
"original_filename",
"task_description"
],
"title": "UploadResponse",
"description": "上传响应模型"
},
"ValidationError": {
"properties": {
"loc": {
"items": {
"anyOf": [
{
"type": "string"
},
{
"type": "integer"
}
]
},
"type": "array",
"title": "Location"
},
"msg": {
"type": "string",
"title": "Message"
},
"type": {
"type": "string",
"title": "Error Type"
}
},
"type": "object",
"required": [
"loc",
"msg",
"type"
],
"title": "ValidationError"
}
}
}
}

100
pyproject.toml Normal file
View File

@ -0,0 +1,100 @@
[project]
name = "lazy-fjh"
version = "2.0.0"
description = "时间序列数据分析系统 - FastAPI 版本"
readme = "README.md"
requires-python = ">=3.10"
authors = [{ name = "Your Name", email = "your.email@example.com" }]
keywords = ["time-series", "data-analysis", "fastapi", "statistical-analysis"]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Operating System :: OS Independent",
]
dependencies = [
# FastAPI 和 Web 框架
"fastapi>=0.104.1",
"uvicorn[standard]>=0.24.0",
"python-multipart>=0.0.6",
"python-dotenv>=1.0.0",
# 数据处理
"pandas>=2.2.2",
"numpy>=1.26.4",
# 统计和科学计算
"scipy>=1.13.0",
"scikit-learn>=1.3.0",
"statsmodels>=0.14.0",
# 可视化
"matplotlib>=3.7.2",
"seaborn>=0.12.2",
# 报告生成
"reportlab>=4.0.4",
"python-docx>=0.8.11",
"python-pptx>=0.6.21",
# API 和数据
"openai>=1.3.0",
"gradio_client>=0.9.0",
"beautifulsoup4>=4.12.2",
"requests>=2.31.0",
# 系统和图像
"psutil>=5.9.5",
"Pillow>=10.0.0",
"opencv-python>=4.8.1.78",
]
[project.optional-dependencies]
dev = [
"pytest>=7.4.0",
"pytest-cov>=4.1.0",
"black>=23.0.0",
"ruff>=0.1.0",
"mypy>=1.0.0",
]
prod = ["gunicorn>=21.2.0", "supervisor>=4.2.5"]
[build-system]
requires = ["flit_core >=3.2,<4"]
build-backend = "flit_core.buildapi"
[tool.uv]
dev-dependencies = [
"pytest>=7.4.0",
"pytest-cov>=4.1.0",
"black>=23.0.0",
"ruff>=0.1.0",
]
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
python_classes = "Test*"
python_functions = "test_*"
[tool.black]
line-length = 100
target-version = ["py310", "py311", "py312"]
[tool.ruff]
line-length = 100
target-version = "py310"
select = ["E", "F", "W", "I"]
ignore = ["E501"]
[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = false

29
requirements.txt Normal file
View File

@ -0,0 +1,29 @@
# FastAPI 和 Web 框架
fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6
python-dotenv==1.0.0
# 数据处理
pandas==2.2.2
numpy==1.26.4
# 统计和科学计算
scipy==1.13.0
scikit-learn==1.3.0
statsmodels==0.14.0
# 可视化
matplotlib==3.7.2
seaborn==0.12.2
# API 和数据
openai==1.3.0
gradio_client>=0.9.0
requests==2.31.0
# 系统和图像
psutil==5.9.5
# 生产部署
gunicorn==21.2.0

View File

@ -0,0 +1,96 @@
Copyright 2014-2021 Adobe (http://www.adobe.com/), with Reserved Font
Name 'Source'. Source is a trademark of Adobe in the United States
and/or other countries.
This Font Software is licensed under the SIL Open Font License,
Version 1.1.
This license is copied below, and is also available with a FAQ at:
http://scripts.sil.org/OFL
-----------------------------------------------------------
SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
-----------------------------------------------------------
PREAMBLE
The goals of the Open Font License (OFL) are to stimulate worldwide
development of collaborative font projects, to support the font
creation efforts of academic and linguistic communities, and to
provide a free and open framework in which fonts may be shared and
improved in partnership with others.
The OFL allows the licensed fonts to be used, studied, modified and
redistributed freely as long as they are not sold by themselves. The
fonts, including any derivative works, can be bundled, embedded,
redistributed and/or sold with any software provided that any reserved
names are not used by derivative works. The fonts and derivatives,
however, cannot be released under any other type of license. The
requirement for fonts to remain under this license does not apply to
any document created using the fonts or their derivatives.
DEFINITIONS
"Font Software" refers to the set of files released by the Copyright
Holder(s) under this license and clearly marked as such. This may
include source files, build scripts and documentation.
"Reserved Font Name" refers to any names specified as such after the
copyright statement(s).
"Original Version" refers to the collection of Font Software
components as distributed by the Copyright Holder(s).
"Modified Version" refers to any derivative made by adding to,
deleting, or substituting -- in part or in whole -- any of the
components of the Original Version, by changing formats or by porting
the Font Software to a new environment.
"Author" refers to any designer, engineer, programmer, technical
writer or other person who contributed to the Font Software.
PERMISSION & CONDITIONS
Permission is hereby granted, free of charge, to any person obtaining
a copy of the Font Software, to use, study, copy, merge, embed,
modify, redistribute, and sell modified and unmodified copies of the
Font Software, subject to the following conditions:
1) Neither the Font Software nor any of its individual components, in
Original or Modified Versions, may be sold by itself.
2) Original or Modified Versions of the Font Software may be bundled,
redistributed and/or sold with any software, provided that each copy
contains the above copyright notice and this license. These can be
included either as stand-alone text files, human-readable headers or
in the appropriate machine-readable metadata fields within text or
binary files as long as those fields can be easily viewed by the user.
3) No Modified Version of the Font Software may use the Reserved Font
Name(s) unless explicit written permission is granted by the
corresponding Copyright Holder. This restriction only applies to the
primary font name as presented to the users.
4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
Software shall not be used to promote, endorse or advertise any
Modified Version, except to acknowledge the contribution(s) of the
Copyright Holder(s) and the Author(s) or with their explicit written
permission.
5) The Font Software, modified or unmodified, in part or in whole,
must be distributed entirely under this license, and must not be
distributed under any other license. The requirement for fonts to
remain under this license does not apply to any document created using
the Font Software.
TERMINATION
This license becomes null and void if any of the above conditions are
not met.
DISCLAIMER
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
OTHER DEALINGS IN THE FONT SOFTWARE.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

126
run.sh Normal file
View File

@ -0,0 +1,126 @@
#!/bin/bash
# FastAPI 应用启动脚本 (使用 uv 包管理)
set -e
echo "========================================"
echo "启动 FastAPI 时间序列分析系统 v2.0"
echo "========================================"
echo ""
# 检查 uv
if ! command -v /home/syy/.local/bin/uv &> /dev/null; then
echo "错误: 未找到 uv"
exit 1
fi
echo "✓ uv 已安装"
echo ""
# 检查虚拟环境,如果不存在则创建
if [ ! -d ".venv" ]; then
echo "创建虚拟环境..."
/home/syy/.local/bin/uv venv --python 3.10
fi
# 激活虚拟环境
echo "激活虚拟环境..."
source .venv/bin/activate
# 加载 .env不覆盖已存在的环境变量
if [ -f ".env" ]; then
echo "加载 .env..."
while IFS=$'\t' read -r key quoted_value; do
[ -z "$key" ] && continue
# 仅允许合法的环境变量名
if [[ ! "$key" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then
continue
fi
# 已在环境中显式设置的变量优先
if [ -z "${!key+x}" ]; then
eval "export ${key}=${quoted_value}"
fi
done < <(python - <<'PY'
import os
import shlex
try:
from dotenv import dotenv_values
except Exception:
dotenv_values = None
if dotenv_values is None:
raise SystemExit(0)
values = dotenv_values('.env')
for k, v in values.items():
if k is None or v is None:
continue
# 输出KEY<TAB>shell_quoted_value
print(f"{k}\t{shlex.quote(str(v))}")
PY
)
echo "✓ .env 加载完成"
echo ""
fi
# 检查并安装依赖
echo "检查依赖..."
python -c "import fastapi; import uvicorn; import pandas; import numpy" 2>/dev/null || {
echo "安装依赖..."
/home/syy/.local/bin/uv pip install \
'fastapi>=0.104.1' \
'uvicorn[standard]>=0.24.0' \
'python-multipart>=0.0.6' \
'python-dotenv>=1.0.0' \
'pandas>=2.2.2' \
'numpy>=1.26.4' \
'scipy>=1.13.0' \
'scikit-learn>=1.3.0' \
'statsmodels>=0.14.0' \
'matplotlib>=3.7.2' \
'seaborn>=0.12.2' \
'openai>=1.3.0' \
'gradio_client>=0.9.0' \
'requests>=2.31.0' \
'psutil>=5.9.5'
}
echo "✓ 依赖检查完成"
echo ""
# 创建必要的目录
mkdir -p uploads logs temp resource/fonts
# 设置环境变量(如果没有设置)
export ENV=${ENV:-"development"}
export DEBUG=${DEBUG:-"False"}
export HOST=${HOST:-"0.0.0.0"}
export PORT=${PORT:-"60201"}
export LOG_LEVEL=${LOG_LEVEL:-"INFO"}
echo "环境配置:"
echo " ENV=$ENV"
echo " DEBUG=$DEBUG"
echo " HOST=$HOST"
echo " PORT=$PORT"
echo " LOG_LEVEL=$LOG_LEVEL"
echo ""
# 启动应用
echo "启动应用..."
echo ""
echo "=================================="
echo "✓ 访问地址: http://localhost:$PORT"
echo "✓ API 文档: http://localhost:$PORT/docs"
echo "✓ ReDoc: http://localhost:$PORT/redoc"
echo "=================================="
echo ""
echo "按 Ctrl+C 停止应用"
echo ""
# 使用 uvicorn 运行
python -m uvicorn app.main:app \
--host $HOST \
--port $PORT \
--log-level $(echo $LOG_LEVEL | tr '[:upper:]' '[:lower:]')

View File

@ -0,0 +1,158 @@
import os
import sys
import shutil
import json
from pathlib import Path
import pandas as pd
import numpy as np
# Add project root to path
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from app.services.analysis_system import TimeSeriesAnalysisSystem
from app.core.config import settings
class NpEncoder(json.JSONEncoder):
"""
JSON encoder that handles NumPy types
"""
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
if isinstance(obj, np.floating):
return float(obj)
if isinstance(obj, (np.bool_, bool)):
return bool(obj)
if isinstance(obj, np.ndarray):
return obj.tolist()
if isinstance(obj, pd.Timestamp):
return str(obj)
return super(NpEncoder, self).default(obj)
def format_details(details):
"""
Format detailed results for text output
"""
if details is None:
return ""
# Handle pandas Series/DataFrame
if isinstance(details, (pd.DataFrame, pd.Series)):
try:
return details.to_markdown() if hasattr(details, 'to_markdown') else details.to_string()
except ImportError:
return details.to_string()
# Handle Dict/List (JSON-like)
if isinstance(details, (dict, list)):
try:
return json.dumps(details, cls=NpEncoder, indent=2, ensure_ascii=False)
except Exception as e:
return f"JSON Serialization Error: {e}\nRaw: {str(details)}"
return str(details)
def run_all_analyses():
# Setup paths
base_dir = Path(__file__).parent
test_dir = base_dir / "test"
csv_filename = "comprehensive_test_data.csv"
csv_path = test_dir / csv_filename
if not csv_path.exists():
print(f"Error: Test file not found at {csv_path}")
return
output_dir = test_dir / "results"
output_dir.mkdir(exist_ok=True)
print(f"Starting analysis on {csv_path}")
print(f"Results will be saved to {output_dir}")
# Initialize System
# generate_plots=False allows skipping image generation but still returns full data details
system = TimeSeriesAnalysisSystem(
str(csv_path),
task_description="Test Suite Analysis",
language="zh",
generate_plots=False
)
if not system.load_and_preprocess_data():
print("Failed to load data")
return
# Define methods to run
methods = [
('statistical_overview', system.generate_statistical_overview),
('time_series_analysis', system.generate_time_series_plots),
('acf_pacf_analysis', system.generate_acf_pacf_plots),
('stationarity_tests', system.perform_stationarity_tests),
('normality_tests', system.perform_normality_tests),
('seasonal_decomposition', system.perform_seasonal_decomposition),
('spectral_analysis', system.perform_spectral_analysis),
('correlation_analysis', system.generate_correlation_heatmap),
('pca_scree_plot', system.generate_pca_scree_plot),
('pca_analysis', system.perform_pca_analysis),
('feature_importance', system.analyze_feature_importance),
('clustering_analysis', system.perform_clustering_analysis),
('factor_analysis', system.perform_factor_analysis),
('cointegration_test', system.perform_cointegration_test),
('var_analysis', system.perform_var_analysis)
]
for name, method in methods:
print(f"\nrunning {name}...")
try:
result = method()
img_path = None
summary = ""
details = None
# Parse result
if isinstance(result, tuple):
if len(result) == 3:
img_path, summary, details = result
elif len(result) == 2:
img_path, summary = result
else:
summary = str(result)
# Save Output
base_output_name = f"{name}_output"
# 1. Save Summary & Details
txt_path = output_dir / f"{base_output_name}.txt"
with open(txt_path, "w", encoding="utf-8") as f:
f.write(f"Method: {name}\n")
f.write("-" * 50 + "\n")
f.write("Summary:\n")
f.write(str(summary))
f.write("\n\n")
if details is not None:
f.write("Detailed Results:\n")
f.write("-" * 50 + "\n")
formatted_details = format_details(details)
f.write(formatted_details)
f.write("\n")
print(f" Saved full details to {txt_path.name}")
# 2. Save Image (if any)
if img_path and os.path.exists(img_path):
ext = os.path.splitext(img_path)[1]
target_img_path = output_dir / f"{base_output_name}{ext}"
shutil.copy2(img_path, target_img_path)
print(f" Saved image to {target_img_path.name}")
else:
pass # No image expected if generate_plots=False
except Exception as e:
print(f" Error running {name}: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
run_all_analyses()

3060
uv.lock Normal file

File diff suppressed because it is too large Load Diff