Jsonbackend

2026-01-29 18:18:32 +08:00 · 2026-01-29 18:18:32 +08:00 · dd1087ad23
commit dd1087ad23
53 changed files with 8855 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,47 @@
 # 环境配置模板
 # 复制此文件为 .env 并填入实际值
 # 环境
 ENV=development
 DEBUG=False
 # 服务器配置
 HOST=0.0.0.0
 PORT=60201
 # CORS 配置 (逗号分隔)
 CORS_ORIGINS=*
 # API 暴露模式
 # full: 暴露 v1 + v2（默认）
 # v2: 仅暴露 /api/v2 分析接口 + 基础状态接口（禁用 v1 上传/文件/图片接口）
 API_MODE=full
 # 文件上传
 UPLOAD_DIR=uploads
 MAX_UPLOAD_SIZE=16777216  # 16MB (字节)
 TEMP_DIR=temp
 # 字体配置
 FONTS_DIR=resource/fonts
 # API 配置 (阿里云千问)
 MY_API_KEY=sk-your-api-key-here
 MY_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
 MY_MODEL=qwen-turbo
 # 分析配置
 LANGUAGE_DEFAULT=zh
 ANALYSIS_TIMEOUT=300
 MAX_MEMORY_MB=500
 # v2 (OSS URL) 安全配置
 # V2_ALLOWED_HOSTS=oss.example.com,oss-cn-hangzhou.aliyuncs.com
 # V2_ALLOW_HTTP=False
 # V2_ALLOW_PRIVATE_NETWORKS=False
 # V2_CONNECT_TIMEOUT_SECONDS=5
 # V2_DOWNLOAD_TIMEOUT_SECONDS=30
 # 日志
 LOG_LEVEL=INFO
 LOG_DIR=logs
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,22 @@
 **/.DS_Store
 .venv/
 **/__pycache__/
 **/*.pyc
 **/*.pyo
 **/*.pyd
 .vscode/
 .idea/
 **/*.swp
 .env
 uploads/
 logs/
 # generated artifacts
 test/results/
 *.log
 temp/
 test/
--- a/1.md
+++ b/1.md
@ -0,0 +1,26 @@
 是的，完全正确。
 简单总结就是 “三步走”：
 1. 进目录
 打开 WSL 终端，进入项目文件夹：
 Bash
 cd /mnt/h/vs_code/Python-Server
 2. 激活环境
 让终端进入 Python 虚拟环境（看到前面有 (.venv) 就算成功）：
 Bash
 source .venv/bin/activate
 3. 跑起来
 启动服务（记得加 --host 0.0.0.0 方便 Windows 访问）：
 Bash
 uvicorn app.main:app --host 0.0.0.0 --port 60201 --reload
 然后就可以去浏览器访问 http://localhost:60201/docs 了。祝开发顺利！
--- a/DEPLOYMENT.md
+++ b/DEPLOYMENT.md
@ -0,0 +1,271 @@
 # FastAPI 应用生产部署说明
 ## 快速开始
 ### 1. 环境要求
 - Python 3.10+
 - Linux / macOS / Windows
 - 20GB 磁盘空间（用于字体和数据）
 ### 2. 一键安装和启动
 ```bash
 # 首次运行，会自动创建虚拟环境和安装依赖
 bash run.sh
 ```
 ### 3. Docker 部署
 ```bash
 # 构建 Docker 镜像
 docker build -t lazy-fjh:latest .
 # 运行容器
 docker run -d \
  -p 60201:60201 \
  -v $(pwd)/uploads:/opt/lazy_fjh/uploads \
  -v $(pwd)/logs:/opt/lazy_fjh/logs \
  -e MY_API_KEY=sk-your-key \
  lazy-fjh:latest
 ```
 ### 4. Systemd 部署 (Linux)
 ```bash
 # 复制应用到系统目录
 sudo cp -r /path/to/lazy_fjh /opt/
 # 更新权限
 sudo chown -R www-data:www-data /opt/lazy_fjh
 # 安装 systemd 服务
 sudo cp /opt/lazy_fjh/deploy/systemd/lazy-fjh.service /etc/systemd/system/
 # 启用并启动服务
 sudo systemctl daemon-reload
 sudo systemctl enable lazy-fjh
 sudo systemctl start lazy-fjh
 # 检查状态
 sudo systemctl status lazy-fjh
 ```
 ### 5. Gunicorn 部署
 ```bash
 # 激活虚拟环境
 source .venv/bin/activate
 # 使用 gunicorn 启动
 gunicorn -c deploy/gunicorn_config.py main:app
 ```
 ## 字体配置
 ### Linux 用户
 首先安装系统字体：
 ```bash
 bash deploy/install_fonts.sh
 ```
 或手动安装：
 ```bash
 # Ubuntu/Debian
 sudo apt-get install -y fonts-wqy-microhei fonts-noto-cjk-extra
 # CentOS/RHEL
 sudo yum install -y wqy-microhei
 # Arch Linux
 sudo pacman -S --noconfirm wqy-microhei ttf-noto-sans-cjk
 ```
 ### macOS 用户
 ```bash
 brew install --cask font-noto-sans-cjk
 ```
 ### Windows 用户
 从 https://www.noto-fonts.cn 下载 Noto Sans CJK 并安装
 ## 环境变量配置
 复制 `.env.example` 为 `.env` 并填入实际值：
 ```bash
 cp .env.example .env
 ```
 编辑 `.env` 文件：
 ```env
 # 环境
 ENV=production
 DEBUG=False
 # 服务器
 HOST=0.0.0.0
 PORT=60201
 # API 密钥 (阿里云千问)
 MY_API_KEY=sk-your-api-key-here
 MY_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
 MY_MODEL=qwen-turbo
 ```
 ## 文件存储
 ### 上传目录
 - **默认**: `./uploads/`
 - **配置**: 设置 `UPLOAD_DIR` 环境变量
 ### 日志目录
 - **默认**: `./logs/`
 - **配置**: 设置 `LOG_DIR` 环境变量
 ## API 文档
 启动应用后访问：
 - **Swagger UI**: http://localhost:60201/docs
 - **ReDoc**: http://localhost:60201/redoc
 - **OpenAPI**: http://localhost:60201/openapi.json
 ## 常见问题
 ### 1. 字体显示为方块
 **原因**: 系统未安装中文字体
 **解决**:
 ```bash
 bash deploy/install_fonts.sh
 ```
 ### 2. 内存占用过高
 **原因**: 处理大型数据集时内存使用增多
 **解决**: 
 - 调整 `MAX_MEMORY_MB` 环境变量
 - 分批处理数据
 - 增加服务器内存
 ### 3. 上传文件超时
 **原因**: 文件过大或网络问题
 **解决**:
 - 检查 `MAX_UPLOAD_SIZE` 限制
 - 增加 `ANALYSIS_TIMEOUT` 值
 - 分割大文件
 ### 4. 无法访问 API
 **原因**: 防火墙或端口被占用
 **解决**:
 ```bash
 # 检查端口占用
 sudo lsof -i :60201
 # 更改 PORT 环境变量
 export PORT=8080
 bash run.sh
 ```
 ## 监控和维护
 ### 查看日志
 ```bash
 # 实时日志
 tail -f logs/app.log
 # 访问日志 (Gunicorn)
 tail -f logs/access.log
 ```
 ### 系统资源监控
 ```bash
 # 使用 top/htop 监控
 htop
 # 或在 Python 中
 python -c "from modules.linux_adapter import LinuxAdapter; print(LinuxAdapter.get_process_info())"
 ```
 ### 定期清理
 ```bash
 # 清理临时文件（超过 7 天）
 find ./temp -type f -mtime +7 -delete
 # 清理旧上传文件（超过 30 天）
 find ./uploads -type f -mtime +30 -delete
 ```
 ## 性能优化
 ### 1. 启用 Gzip 压缩
 已默认启用，减少响应体积
 ### 2. 异步处理
 使用异步 I/O，支持更多并发连接
 ### 3. 内存管理
 自动监控和清理内存
 ### 4. 并发配置 (Gunicorn)
 ```
 workers = cpu_count * 2 + 1
 worker_connections = 1000
 ```
 ## 备份和恢复
 ### 备份上传的文件
 ```bash
 tar -czf backup-uploads-$(date +%Y%m%d).tar.gz uploads/
 ```
 ### 备份数据库 (如果使用)
 ```bash
 # PostgreSQL
 pg_dump -U user db_name > backup.sql
 ```
 ## 更新应用
 ```bash
 # 拉取最新代码
 git pull origin main
 # 重新安装依赖（如有更新）
 /home/syy/.local/bin/uv pip install --upgrade -r requirements.txt
 # 重启服务
 systemctl restart lazy-fjh
 ```
 ## 安全建议
 1. **API 密钥**: 不要在代码中硬编码，使用环境变量
 2. **HTTPS**: 在生产环境使用 HTTPS，配置 SSL 证书
 3. **CORS**: 根据需要限制 CORS 源
 4. **速率限制**: 考虑添加 API 速率限制
 5. **认证**: 为敏感端点添加身份验证
 ## 支持和反馈
 如有问题或建议，请提交 issue 或联系技术支持。
--- a/35
+++ b/35
@ -0,0 +1,35 @@
 FROM python:3.11-slim
 # 安装系统依赖
 RUN apt-get update && apt-get install -y \
    fonts-wqy-microhei \
    fonts-noto-cjk \
    fonts-liberation \
    fonts-dejavu \
    libgomp1 \
    libsm6 \
    libxext6 \
    libxrender-dev \
    && rm -rf /var/lib/apt/lists/*
 # 设置工作目录
 WORKDIR /app
 # 复制项目文件
 COPY . .
 # 安装 Python 依赖
 RUN pip install --no-cache-dir -r requirements.txt
 # 创建必要的目录
 RUN mkdir -p uploads logs temp resource/fonts
 # 暴露端口
 EXPOSE 60201
 # 健康检查
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:60201/health || exit 1
 # 启动命令
 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "60201"]
--- a/README.md
+++ b/README.md
@ -0,0 +1,127 @@
 # Lazy Stat（FastAPI 时间序列分析后端）
 基于 FastAPI 的时间序列数据分析服务：上传 CSV → 运行多种统计/时序/多变量分析 → 返回结构化结果（包含 `steps[]` 明细，便于前端渲染与调试）。
 ## 功能概览
 - 15+ 分析步骤：统计概览、时间序列分析、ACF/PACF、平稳性、正态性、季节分解、频谱、相关性、PCA、聚类、因子分析、协整检验、VAR 等
 - 统一输出结构：每一步包含 `summary` + `data/columns`（或 dict 结果），且保证 JSON 可序列化
 - 可选绘图：通过 `generate_plots` 控制是否生成图片，并通过文件接口访问
 ## 快速开始（本地）
 一键启动（使用 `uv` 管理虚拟环境/依赖）：
 ```bash
 bash run.sh
 ```
 启动后访问：
 - Swagger: `http://localhost:60201/docs`
 - ReDoc: `http://localhost:60201/redoc`
 - Health: `http://localhost:60201/health`
 服务入口为 `app.main:app`（见 [app/main.py](app/main.py)）。
 ## Docker / Compose
 使用 Compose：
 ```bash
 docker compose up --build
 ```
 Compose 配置见 [docker-compose.yml](docker-compose.yml)。
 ## 环境变量
 示例文件见 [.env.example](.env.example)。常用变量：
 - `HOST` / `PORT`：监听地址与端口（默认 `0.0.0.0:60201`）
 - `ENV` / `DEBUG`：运行环境
 - `MAX_MEMORY_MB`：内存阈值（超过会触发 gc）
 - `ANALYSIS_TIMEOUT`：分析超时（如有）
 - `MY_API_KEY`：外部大模型 API Key
 开发/冒烟测试如果不希望调用外部大模型，可设置：
 ```bash
 export MY_API_KEY=simulation-mode
 ```
 如果希望仅开放 v2（OSS URL）分析接口、禁用 v1 上传/文件/图片接口，可设置：
 ```bash
 export API_MODE=v2
 ```
 ## API 使用
 所有 API 都挂在 `/api` 前缀下。
 ### 1) 上传 CSV
 `POST /api/upload`（当前实现仅支持 CSV）：
 ```bash
 curl -F "file=@test/comprehensive_test_data.csv" \
 		 -F "task_description=demo" \
 		 http://localhost:60201/api/upload
 ```
 返回会给出 `filename`（服务端保存后的文件名），后续分析时使用它。
 ### 2) 运行分析
 `POST /api/analyze`：
 ```bash
 curl -H "Content-Type: application/json" \
 	-d '{
 		"filename": "<upload 返回的 filename>",
 		"task_description": "demo",
 		"language": "zh",
 		"generate_plots": false
 	}' \
 	http://localhost:60201/api/analyze
 ```
 响应结构要点：
 - `meta`: 文件名、语言、是否绘图、创建时间等
 - `analysis.<lang>.steps[]`: 每个分析步骤的结构化结果（`key/title/summary/data/columns/api_analysis` 等）
 - `images`: 当 `generate_plots=true` 时包含图片文件名；可用 `GET /api/image/{filename}` 获取
 ### 2.1) v2：从 OSS URL 分析（推荐）
 `POST /api/v2/analyze`：传入 `oss_url`，后端会下载到临时文件分析并返回结构化 `steps[]`；默认不产图（你也可以传 `generate_plots=true` 以保持与 v1 同能力）。
 ```bash
 curl -H "Content-Type: application/json" \
 	-d '{
 		"oss_url": "https://<your-oss-presigned-url>",
 		"task_description": "demo",
 		"language": "zh",
 		"generate_plots": false
 	}' \
 	http://localhost:60201/api/v2/analyze
 ```
 ### 3) 其他接口
 - `GET /api/available_methods`：列出可用分析方法
 - `GET /api/list_uploads`：列出 uploads 文件
 - `GET /api/download/{filename}`：下载文件
 ## 生成“完整文本输出”（用于调试/验收）
 脚本 [run_analysis_on_test_data.py](run_analysis_on_test_data.py) 会对测试数据跑完整流程，并把每一步的 `summary + details` 输出到 `test/results/*.txt`，适合检查 p 值、数组、DataFrame 等完整信息：
 ```bash
 python3 run_analysis_on_test_data.py
 ```
 ## 部署
 生产部署说明见 [DEPLOYMENT.md](DEPLOYMENT.md)。
--- a/app/init.py
+++ b/app/init.py
--- a/app/api/init.py
+++ b/app/api/init.py
--- a/app/api/routes/init.py
+++ b/app/api/routes/init.py
@ -0,0 +1,5 @@
 """
 路由包初始化
 """
 __all__ = ['upload', 'analysis', 'analysis_v2', 'files']
--- a/app/api/routes/analysis.py
+++ b/app/api/routes/analysis.py
@ -0,0 +1,189 @@
 """
 分析路由
 """
 import logging
 import json
 from datetime import datetime
 from typing import Optional, Dict, Any, List
 from fastapi import APIRouter, HTTPException, status, BackgroundTasks
 from pydantic import BaseModel
 import psutil
 import os
 import gc
 import shutil
 from app.core.config import settings
 from app.services.analysis import TimeSeriesAnalysisSystem
 logger = logging.getLogger(__name__)
 router = APIRouter()
 class AnalysisRequest(BaseModel):
    """分析请求模型"""
    filename: str
    file_type: str = "csv"
    task_description: str = "时间序列数据分析"
    data_background: Dict[str, Any] = {}
    original_image: Optional[str] = None
    language: str = "zh"
    generate_plots: bool = False
@router.get("/available_methods", summary="获取可用的分析方法")
 async def get_available_methods() -> dict:
    """获取所有可用的分析方法"""
    return {
        "success": True,
        "methods": {
            'statistical_overview': {'name': '统计概览', 'description': '生成数据的基本统计信息和分布图表'},
            'time_series_analysis': {'name': '时间序列分析', 'description': '分析变量随时间变化的趋势和模式'},
            'acf_pacf_analysis': {'name': '自相关分析', 'description': '生成自相关和偏自相关函数图'},
            'stationarity_tests': {'name': '平稳性检验', 'description': '执行ADF、KPSS等平稳性检验'},
            'normality_tests': {'name': '正态性检验', 'description': '执行Shapiro-Wilk、Jarque-Bera正态性检验'},
            'seasonal_decomposition': {'name': '季节性分解', 'description': '分解时间序列的趋势、季节和残差成分'},
            'spectral_analysis': {'name': '频谱分析', 'description': '分析时间序列的频域特征'},
            'correlation_analysis': {'name': '相关性分析', 'description': '计算变量间的相关性并生成热力图'},
            'pca_scree_plot': {'name': 'PCA碎石图', 'description': '显示主成分分析的解释方差'},
            'pca_analysis': {'name': '主成分分析', 'description': '降维分析，识别数据的主要变化方向'},
            'feature_importance': {'name': '特征重要性', 'description': '分析各变量对目标预测的重要性'},
            'clustering_analysis': {'name': '聚类分析', 'description': '将数据点分组为具有相似特征的簇'},
            'factor_analysis': {'name': '因子分析', 'description': '识别潜在的因子结构'},
            'cointegration_test': {'name': '协整检验', 'description': '检验时间序列变量间的长期均衡关系'},
            'var_analysis': {'name': '向量自回归', 'description': '多变量时间序列建模和预测'}
        }
    }
 def check_memory():
    """检查内存使用"""
    process = psutil.Process(os.getpid())
    memory_mb = process.memory_info().rss / 1024 / 1024
    logger.info(f"当前内存使用: {memory_mb:.2f} MB")
    if memory_mb > settings.MAX_MEMORY_MB:
        logger.warning(f"内存使用超过阈值 ({settings.MAX_MEMORY_MB} MB)，执行垃圾回收")
        gc.collect()
@router.post("/analyze", summary="执行完整分析")
 async def analyze_data(request: AnalysisRequest, background_tasks: BackgroundTasks) -> dict:
    """
    执行完整的时间序列分析
    流程:
    1. 加载并预处理数据
    2. 执行15种分析方法
    3. 调用AI API 进行深度分析
    4. 生成PDF/PPT/HTML报告
    """
    try:
        logger.info("=" * 60)
        logger.info(f"开始分析: {request.filename}")
        logger.info(f"任务: {request.task_description}")
        logger.info(f"语言: {request.language}")
        logger.info("=" * 60)
        # 检查内存
        check_memory()
        # 检查文件存在
        file_path = settings.get_upload_path(request.filename)
        if not file_path.exists():
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail=f"文件未找到: {request.filename}"
            )
        # 语言处理：支持 zh/en，其他值回退为 zh
        lang_key = request.language if request.language in {"zh", "en"} else "zh"
        # charts 模式下强制不生成图片，即使请求传了 generate_plots=true
        generate_plots = False
        if request.generate_plots:
            logger.info("generate_plots requested true, forcing false to skip image generation")
        # 创建分析器实例
        logger.info(f"初始化分析器 ({lang_key})...")
        analyzer = TimeSeriesAnalysisSystem(
            str(file_path),
            request.task_description,
            data_background=request.data_background,
            language=lang_key,
            generate_plots=generate_plots
        )
        # 运行分析
        logger.info("执行分析...")
        results_zh, log_zh = analyzer.run_analysis()
        if results_zh is None:
            logger.error("中文分析失败")
            raise HTTPException(
                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
                detail="分析失败"
            )
        logger.info("中文分析完成")
        # 准备返回数据
        response_data = {
            "success": True,
            "meta": {
                "filename": request.filename,
                "task_description": request.task_description,
                "language": lang_key,
                "generate_plots": generate_plots,
                "created_at": datetime.now().isoformat(),
            },
            "analysis": {
                lang_key: {
                    "pdf_filename": None,
                    "ppt_filename": None,
                    "data_description": results_zh.get("data_description"),
                    "preprocessing_steps": results_zh.get("preprocessing_steps", []),
                    "api_analysis": results_zh.get("api_analysis", {}),
                    "steps": results_zh.get("steps", []),
                    "charts": results_zh.get("charts", {}),
                }
            },
            "images": {},
            "log": log_zh[-20:] if log_zh else [],
            "original_image": request.original_image if request.file_type == 'image' else None,
        }
        # 兼容旧前端：始终提供 analysis.zh
        if lang_key != "zh":
            response_data["analysis"]["zh"] = response_data["analysis"][lang_key]
        analysis_bucket = response_data["analysis"][lang_key]
        # 去除任何遗留的 image_path（兼容旧结构）
        steps = analysis_bucket.get("steps")
        if isinstance(steps, list):
            for step in steps:
                if isinstance(step, dict) and "image_path" in step:
                    step.pop("image_path", None)
        # images 保持为空兼容旧前端
        response_data["images"] = {}
        logger.info("分析完成")
        return response_data
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"分析异常: {str(e)}", exc_info=True)
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )
--- a/app/api/routes/analysis_v2.py
+++ b/app/api/routes/analysis_v2.py
@ -0,0 +1,191 @@
 """v2 analysis route: analyze CSV from OSS/URL.
 Design goals:
 - Keep v1 endpoints unchanged
 - Provide the same response shape as v1, but with URL as input
 - Avoid leaking server local paths
 """
 import gc
 import logging
 import os
 import shutil
 from datetime import datetime
 from typing import Any, Dict, Optional
 import psutil
 from fastapi import APIRouter, BackgroundTasks, HTTPException, status
 from pydantic import BaseModel
 from app.core.config import settings
 from app.services.analysis import TimeSeriesAnalysisSystem
 from app.services.oss_csv_source import UrlValidationError, download_csv_to_tempfile
 logger = logging.getLogger(__name__)
 router = APIRouter()
 class AnalysisV2Request(BaseModel):
    """v2 分析请求模型（输入为 OSS/URL）"""
    oss_url: str
    task_description: str = "时间序列数据分析"
    data_background: Dict[str, Any] = {}
    language: str = "zh"
    generate_plots: bool = False
    source_name: Optional[str] = None
@router.get("/available_methods", summary="获取可用的分析方法（v2）")
 async def get_available_methods_v2() -> dict:
    """v2 版本：返回与 v1 相同的可用分析方法列表。"""
    return {
        "success": True,
        "methods": {
            "statistical_overview": {"name": "统计概览", "description": "生成数据的基本统计信息和分布图表"},
            "time_series_analysis": {"name": "时间序列分析", "description": "分析变量随时间变化的趋势和模式"},
            "acf_pacf_analysis": {"name": "自相关分析", "description": "生成自相关和偏自相关函数图"},
            "stationarity_tests": {"name": "平稳性检验", "description": "执行ADF、KPSS等平稳性检验"},
            "normality_tests": {"name": "正态性检验", "description": "执行Shapiro-Wilk、Jarque-Bera正态性检验"},
            "seasonal_decomposition": {"name": "季节性分解", "description": "分解时间序列的趋势、季节和残差成分"},
            "spectral_analysis": {"name": "频谱分析", "description": "分析时间序列的频域特征"},
            "correlation_analysis": {"name": "相关性分析", "description": "计算变量间的相关性并生成热力图"},
            "pca_scree_plot": {"name": "PCA碎石图", "description": "显示主成分分析的解释方差"},
            "pca_analysis": {"name": "主成分分析", "description": "降维分析，识别数据的主要变化方向"},
            "feature_importance": {"name": "特征重要性", "description": "分析各变量对目标预测的重要性"},
            "clustering_analysis": {"name": "聚类分析", "description": "将数据点分组为具有相似特征的簇"},
            "factor_analysis": {"name": "因子分析", "description": "识别潜在的因子结构"},
            "cointegration_test": {"name": "协整检验", "description": "检验时间序列变量间的长期均衡关系"},
            "var_analysis": {"name": "向量自回归", "description": "多变量时间序列建模和预测"},
        },
    }
 def check_memory():
    """检查内存使用"""
    process = psutil.Process(os.getpid())
    memory_mb = process.memory_info().rss / 1024 / 1024
    logger.info(f"当前内存使用: {memory_mb:.2f} MB")
    if memory_mb > settings.MAX_MEMORY_MB:
        logger.warning(f"内存使用超过阈值 ({settings.MAX_MEMORY_MB} MB)，执行垃圾回收")
        gc.collect()
@router.post("/analyze", summary="执行完整分析（v2：从 OSS URL 读取 CSV）")
 async def analyze_data_v2(request: AnalysisV2Request, background_tasks: BackgroundTasks) -> dict:
    """Analyze CSV from an OSS/URL, returning the same structure as v1."""
    downloaded = None
    try:
        logger.info("=" * 60)
        logger.info("开始分析 (v2)")
        logger.info(f"URL host: {request.oss_url}")
        logger.info(f"任务: {request.task_description}")
        logger.info(f"语言: {request.language}")
        logger.info("=" * 60)
        check_memory()
        # 语言处理：支持 zh/en，其他值回退为 zh
        lang_key = request.language if request.language in {"zh", "en"} else "zh"
        # charts 模式下强制不生成图片，即使请求传了 generate_plots=true
        generate_plots = False
        if request.generate_plots:
            logger.info("generate_plots requested true, forcing false to skip image generation")
        # 下载到临时文件
        try:
            downloaded = download_csv_to_tempfile(request.oss_url, suffix=".csv")
        except UrlValidationError as e:
            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
        filename_for_meta = request.source_name or downloaded.source_name
        # 创建分析器实例（复用原有分析系统）
        analyzer = TimeSeriesAnalysisSystem(
            downloaded.local_path,
            request.task_description,
            data_background=request.data_background,
            language=lang_key,
            generate_plots=generate_plots,
        )
        # 运行分析
        logger.info("执行分析...")
        results, log_entries = analyzer.run_analysis()
        if results is None:
            logger.error("分析失败")
            raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="分析失败")
        # 准备返回数据（尽量与 v1 保持一致）
        response_data = {
            "success": True,
            "meta": {
                "filename": filename_for_meta,
                "task_description": request.task_description,
                "language": lang_key,
                "generate_plots": generate_plots,
                "created_at": datetime.now().isoformat(),
                "version": "v2",
                "source": {
                    "type": "oss_url",
                    "host": downloaded.source_host,
                    "name": filename_for_meta,
                    "etag": downloaded.etag,
                    "last_modified": downloaded.last_modified,
                },
            },
            "analysis": {
                lang_key: {
                    "pdf_filename": None,
                    "ppt_filename": None,
                    "data_description": results.get("data_description"),
                    "preprocessing_steps": results.get("preprocessing_steps", []),
                    "api_analysis": results.get("api_analysis", {}),
                    "steps": results.get("steps", []),
                    "charts": results.get("charts", {}),
                }
            },
            "images": {},
            "log": log_entries[-20:] if log_entries else [],
            "original_image": None,
        }
        # 兼容旧前端：始终提供 analysis.zh
        if lang_key != "zh":
            response_data["analysis"]["zh"] = response_data["analysis"][lang_key]
        analysis_bucket = response_data["analysis"][lang_key]
        # 确保不暴露本地路径，steps chart 引用即可
        steps = analysis_bucket.get("steps")
        if isinstance(steps, list):
            for step in steps:
                if isinstance(step, dict) and "image_path" in step:
                    step.pop("image_path", None)
        # images 保留为空兼容旧前端
        response_data["images"] = {}
        logger.info("分析完成 (v2)")
        return response_data
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"分析异常 (v2): {str(e)}", exc_info=True)
        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=str(e))
    finally:
        # 清理临时文件
        if downloaded is not None:
            try:
                os.unlink(downloaded.local_path)
            except Exception:
                pass
--- a/app/api/routes/files.py
+++ b/app/api/routes/files.py
@ -0,0 +1,115 @@
 """
 文件服务路由 (图片、下载等)
 """
 import logging
 from pathlib import Path
 from fastapi import APIRouter, HTTPException, status
 from fastapi.responses import FileResponse
 from app.core.config import settings
 logger = logging.getLogger(__name__)
 router = APIRouter()
@router.get("/image/{filename}", summary="获取图片文件")
 async def serve_image(filename: str):
    """
    获取可视化图片文件
    """
    try:
        file_path = settings.get_upload_path(filename)
        if not file_path.exists():
            logger.error(f"图片未找到: {filename}")
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail="图片未找到"
            )
        logger.info(f"提供图片: {filename}")
        return FileResponse(
            path=str(file_path),
            media_type='image/png',
            filename=filename
        )
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"获取图片异常: {str(e)}", exc_info=True)
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )
@router.get("/download/{filename}", summary="下载文件")
 async def download_file(filename: str):
    """
    下载报告或其他文件
    """
    try:
        file_path = settings.get_upload_path(filename)
        if not file_path.exists():
            logger.error(f"文件未找到: {filename}")
            raise HTTPException(
                status_code=status.HTTP_404_NOT_FOUND,
                detail="文件未找到"
            )
        logger.info(f"下载文件: {filename}")
        return FileResponse(
            path=str(file_path),
            filename=filename,
            media_type='application/octet-stream'
        )
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"下载文件异常: {str(e)}", exc_info=True)
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )
@router.get("/list_uploads", summary="列出上传的文件")
 async def list_uploads():
    """
    列出 uploads 目录中的文件
    """
    try:
        uploads_dir = settings.UPLOAD_DIR
        if not uploads_dir.exists():
            return {
                "success": True,
                "files": []
            }
        files = []
        for file_path in uploads_dir.iterdir():
            if file_path.is_file():
                files.append({
                    "name": file_path.name,
                    "size": file_path.stat().st_size,
                    "modified": file_path.stat().st_mtime
                })
        logger.info(f"列出 {len(files)} 个文件")
        return {
            "success": True,
            "files": sorted(files, key=lambda x: x['modified'], reverse=True)
        }
    except Exception as e:
        logger.error(f"列出文件异常: {str(e)}", exc_info=True)
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )
--- a/app/api/routes/upload.py
+++ b/app/api/routes/upload.py
@ -0,0 +1,124 @@
 """
 文件上传路由
 """
 import logging
 import os
 import shutil
 from datetime import datetime
 from pathlib import Path
 from typing import Optional
 from fastapi import APIRouter, UploadFile, File, Form, HTTPException, status
 from pydantic import BaseModel
 from app.core.config import settings
 logger = logging.getLogger(__name__)
 router = APIRouter()
 class UploadResponse(BaseModel):
    """上传响应模型"""
    success: bool
    filename: str
    file_type: str
    original_filename: str
    task_description: str
    message: Optional[str] = None
 class UploadImageResponse(BaseModel):
    """上传图片响应模型"""
    success: bool
    filename: str
    file_type: str
    original_filename: str
    original_image: str
    task_description: str
    message: str
 def allowed_file(filename: str) -> bool:
    """检查文件是否被允许"""
    if '.' not in filename:
        return False
    ext = filename.rsplit('.', 1)[1].lower()
    return ext in settings.ALLOWED_EXTENSIONS
@router.post("/upload", response_model=UploadResponse, summary="上传CSV或图片文件")
 async def upload_file(
    file: UploadFile = File(...),
    task_description: str = Form(default="时间序列数据分析")
 ) -> dict:
    """
    上传数据文件（CSV 或图片）
    - **file**: CSV 或图片文件 (PNG, JPG, BMP, TIFF)
    - **task_description**: 分析任务描述
    """
    try:
        logger.info(f"=== 上传请求开始 ===")
        logger.info(f"文件名: {file.filename}")
        logger.info(f"任务描述: {task_description}")
        # 检查文件名
        if not file.filename:
            logger.error("文件名为空")
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail="没有选择文件"
            )
        # 检查文件类型
        if not allowed_file(file.filename):
            logger.error(f"不支持的文件类型: {file.filename}")
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=f"不支持的文件类型。允许的类型: {', '.join(settings.ALLOWED_EXTENSIONS)}"
            )
        # 获取文件扩展名
        file_ext = file.filename.rsplit('.', 1)[1].lower()
        # 生成文件名
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        new_filename = f"upload_{timestamp}_{file.filename}"
        # 保存文件
        file_path = settings.get_upload_path(new_filename)
        logger.info(f"保存文件到: {file_path}")
        content = await file.read()
        with open(file_path, 'wb') as f:
            f.write(content)
        logger.info(f"文件保存成功，大小: {len(content)} bytes")
        # 处理不同的文件类型
        if file_ext == 'csv':
            logger.info("处理 CSV 文件")
            return {
                "success": True,
                "filename": new_filename,
                "file_type": "csv",
                "original_filename": file.filename,
                "task_description": task_description
            }
        else:
            logger.warning(f"不支持的文件类型: {file_ext}")
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail=f"目前只支持 CSV 文件。您上传的是: {file_ext}"
            )
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"上传处理异常: {str(e)}", exc_info=True)
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )
--- a/app/core/init.py
+++ b/app/core/init.py
--- a/app/core/config.py
+++ b/app/core/config.py
@ -0,0 +1,122 @@
 """
 FastAPI 应用配置管理
 支持环境变量配置，生产级配置管理
 """
 import os
 from pathlib import Path
 from typing import Optional
 import logging
 try:
    from dotenv import load_dotenv
 except Exception:  # pragma: no cover
    load_dotenv = None
 # 项目根目录
 BASE_DIR = Path(__file__).resolve().parent.parent.parent
 # 加载 .env（不覆盖已存在的系统环境变量）
 _dotenv_path = BASE_DIR / ".env"
 if load_dotenv is not None and _dotenv_path.exists():
    load_dotenv(dotenv_path=_dotenv_path, override=False)
 # 环境变量
 ENVIRONMENT = os.getenv('ENV', 'development')
 DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
 class Settings:
    """应用配置类"""
    # FastAPI 基础配置
    APP_TITLE = "时间序列数据分析系统"
    APP_DESCRIPTION = "支持多格式数据上传、AI增强分析、多语言报告生成"
    APP_VERSION = "2.0.0"
    # API 暴露模式
    # - full: 暴露 v1 + v2（默认）
    # - v2: 仅暴露 v2 分析接口 + 基础状态接口（禁用 v1 上传/文件/图片接口）
    API_MODE = os.getenv('API_MODE', 'full').strip().lower()
    # 服务器配置
    HOST = os.getenv('HOST', '0.0.0.0')
    PORT = int(os.getenv('PORT', 60201))
    RELOAD = DEBUG
    # CORS 配置
    CORS_ORIGINS = os.getenv('CORS_ORIGINS', '*').split(',')
    CORS_ALLOW_CREDENTIALS = True
    CORS_ALLOW_METHODS = ['*']
    CORS_ALLOW_HEADERS = ['*']
    # 文件上传配置
    UPLOAD_DIR = Path(os.getenv('UPLOAD_DIR', BASE_DIR / 'uploads'))
    UPLOAD_DIR.mkdir(exist_ok=True)
    MAX_UPLOAD_SIZE = int(os.getenv('MAX_UPLOAD_SIZE', 16 * 1024 * 1024))  # 16MB
    ALLOWED_EXTENSIONS = {'csv'}
    # 临时文件配置
    TEMP_DIR = Path(os.getenv('TEMP_DIR', BASE_DIR / 'temp'))
    TEMP_DIR.mkdir(exist_ok=True)
    # 字体配置
    FONTS_DIR = Path(os.getenv('FONTS_DIR', BASE_DIR / 'resource' / 'fonts'))
    FONTS_DIR.mkdir(parents=True, exist_ok=True)
    # API 配置 (阿里云千问)
    API_KEY = os.getenv('MY_API_KEY', '')
    API_BASE = os.getenv('MY_API_BASE', 'https://dashscope.aliyuncs.com/compatible-mode/v1')
    API_MODEL = os.getenv('MY_MODEL', 'qwen-turbo')
    API_TIMEOUT = int(os.getenv('API_TIMEOUT', 30))
    # 分析配置
    LANGUAGE_DEFAULT = os.getenv('LANGUAGE_DEFAULT', 'zh')
    ANALYSIS_TIMEOUT = int(os.getenv('ANALYSIS_TIMEOUT', 300))  # 5分钟
    # 日志配置
    LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO' if not DEBUG else 'DEBUG')
    LOG_DIR = Path(os.getenv('LOG_DIR', BASE_DIR / 'logs'))
    LOG_DIR.mkdir(exist_ok=True)
    # 内存管理
    MAX_MEMORY_MB = int(os.getenv('MAX_MEMORY_MB', 500))
    # v2 (OSS URL) 配置
    # 允许的域名白名单（逗号分隔）。为空时表示不启用域名白名单（仍会做私网/环回 IP 拦截）。
    V2_ALLOWED_HOSTS = [h.strip() for h in os.getenv('V2_ALLOWED_HOSTS', '').split(',') if h.strip()]
    # 是否允许 http（默认仅 https）
    V2_ALLOW_HTTP = os.getenv('V2_ALLOW_HTTP', 'False').lower() == 'true'
    # 是否允许私网/环回地址（仅用于本地开发/冒烟；生产建议保持 False）
    V2_ALLOW_PRIVATE_NETWORKS = os.getenv('V2_ALLOW_PRIVATE_NETWORKS', 'False').lower() == 'true'
    # 下载超时（秒）。requests 支持 (connect, read)，这里统一使用 read 超时。
    V2_DOWNLOAD_TIMEOUT_SECONDS = float(os.getenv('V2_DOWNLOAD_TIMEOUT_SECONDS', 30))
    V2_CONNECT_TIMEOUT_SECONDS = float(os.getenv('V2_CONNECT_TIMEOUT_SECONDS', 5))
    @classmethod
    def get_upload_path(cls, filename: str) -> Path:
        """获取上传文件的完整路径"""
        return cls.UPLOAD_DIR / filename
    @classmethod
    def get_temp_path(cls, filename: str) -> Path:
        """获取临时文件的完整路径"""
        return cls.TEMP_DIR / filename
 # 日志配置
 def setup_logging():
    """设置日志系统"""
    logging.basicConfig(
        level=Settings.LOG_LEVEL,
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers=[
            logging.FileHandler(Settings.LOG_DIR / 'app.log'),
            logging.StreamHandler()
        ]
    )
 # 创建全局配置实例
 settings = Settings()
 # 启用日志
 setup_logging()
--- a/app/main.py
+++ b/app/main.py
@ -0,0 +1,124 @@
 """
 FastAPI 应用主入口
 时间序列数据分析系统 FastAPI 版本
 """
 import logging
 from contextlib import asynccontextmanager
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.middleware.gzip import GZipMiddleware
 from app.core.config import settings, setup_logging, ENVIRONMENT, DEBUG
 from app.services.font_manager import setup_fonts_for_app
 from app.services.linux_adapter import init_linux_environment
 # 设置日志
 logger = logging.getLogger(__name__)
 # 应用生命周期
@asynccontextmanager
 async def lifespan(app: FastAPI):
    """应用生命周期管理"""
    # 启动时
    logger.info("=" * 60)
    logger.info(f"应用启动: {settings.APP_TITLE}")
    logger.info(f"版本: {settings.APP_VERSION}")
    logger.info(f"环境: {ENVIRONMENT}")
    logger.info(f"调试: {DEBUG}")
    logger.info(f"监听: {settings.HOST}:{settings.PORT}")
    logger.info("=" * 60)
    # 初始化 Linux 环境
    try:
        init_linux_environment()
    except Exception as e:
        logger.warning(f"Linux 环境初始化失败: {e}")
    # 初始化字体
    try:
        fonts_config = setup_fonts_for_app(['zh', 'en'])
        logger.info(f"字体配置完成: {fonts_config}")
    except Exception as e:
        logger.error(f"字体配置失败: {e}")
    yield
    # 关闭时
    logger.info("应用关闭")
 # 创建 FastAPI 应用
 app = FastAPI(
    title=settings.APP_TITLE,
    description=settings.APP_DESCRIPTION,
    version=settings.APP_VERSION,
    lifespan=lifespan
 )
 # 添加中间件
 # CORS 中间件
 app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.CORS_ORIGINS,
    allow_credentials=settings.CORS_ALLOW_CREDENTIALS,
    allow_methods=settings.CORS_ALLOW_METHODS,
    allow_headers=settings.CORS_ALLOW_HEADERS,
 )
 # 压缩中间件
 app.add_middleware(GZipMiddleware, minimum_size=1000)
 # 导入和包含路由
 from app.api.routes import upload, analysis, analysis_v2, files
 # v2 模式：仅暴露 v2 分析接口 + 基础状态接口
 if settings.API_MODE == "v2":
    logger.info("API_MODE=v2: 禁用 v1 上传/文件接口，仅启用 /api/v2")
    app.include_router(analysis_v2.router, prefix="/api/v2", tags=["analysis-v2"])
 else:
    app.include_router(upload.router, prefix="/api", tags=["upload"])
    app.include_router(analysis.router, prefix="/api", tags=["analysis"])
    app.include_router(analysis_v2.router, prefix="/api/v2", tags=["analysis-v2"])
    app.include_router(files.router, prefix="/api", tags=["files"])
 # 根路由
@app.get("/")
 async def root():
    """根路径"""
    return {
        "message": "Lazy Stat Backend API",
        "version": settings.APP_VERSION,
        "docs": "/docs"
    }
@app.get("/health")
 async def health():
    """健康检查"""
    return {
        "status": "healthy",
        "app": settings.APP_TITLE,
        "version": settings.APP_VERSION
    }
@app.get("/api/config")
 async def get_config():
    """获取应用配置"""
    return {
        "title": settings.APP_TITLE,
        "version": settings.APP_VERSION,
        "max_upload_size": settings.MAX_UPLOAD_SIZE,
        "allowed_extensions": list(settings.ALLOWED_EXTENSIONS),
        "language_default": settings.LANGUAGE_DEFAULT
    }
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(
        "app.main:app",
        host=settings.HOST,
        port=settings.PORT,
        reload=settings.RELOAD,
        log_level=settings.LOG_LEVEL.lower()
    )
--- a/app/services/init.py
+++ b/app/services/init.py
--- a/app/services/analysis/init.py
+++ b/app/services/analysis/init.py
@ -0,0 +1,32 @@
 """Analysis package.
 This package contains the refactored analysis modules.
 Notes:
 - The legacy entrypoint remains `app.services.analysis_system.TimeSeriesAnalysisSystem`.
 - Importing `app.services.analysis_system` eagerly here would create a circular import because
  `analysis_system` imports `app.services.analysis.modules.*`.
 """
 from __future__ import annotations
 from typing import Any, TYPE_CHECKING
 __all__ = ["TimeSeriesAnalysisSystem"]
 if TYPE_CHECKING:
 	from app.services.analysis_system import TimeSeriesAnalysisSystem as TimeSeriesAnalysisSystem
 def __getattr__(name: str) -> Any:
 	if name == "TimeSeriesAnalysisSystem":
 		from app.services.analysis_system import TimeSeriesAnalysisSystem
 		return TimeSeriesAnalysisSystem
 	raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
 def __dir__() -> list[str]:
 	return sorted(list(globals().keys()) + __all__)
--- a/app/services/analysis/modules/init.py
+++ b/app/services/analysis/modules/init.py
@ -0,0 +1,4 @@
 """Implementation modules for analysis methods.
 Each file contains one or a small group of closely-related analysis methods.
 """
--- a/app/services/analysis/modules/basic.py
+++ b/app/services/analysis/modules/basic.py
@ -0,0 +1,180 @@
 import gc
 import os
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from scipy import stats
 def generate_statistical_overview(self):
    """生成统计概览 - 优化内存版本"""
    fig = None
    try:
        self._log_step("Generating statistical overview...")
        # 检查数据
        if not hasattr(self, 'data') or self.data is None or len(self.data) == 0:
            self._log_step("No data available for statistical overview", "warning")
            return None, "No data available", None
        # 计算统计数据
        numeric_cols = self.data.select_dtypes(include=[np.number]).columns
        stats_df = self.data[numeric_cols].describe().T.reset_index().rename(columns={'index': 'variable'})
        summary = f"Generated statistical overview for {len(numeric_cols)} variables"
        if not self.generate_plots:
            self._log_step("Statistical overview generated (data only)", "success")
            return None, summary, stats_df
        # 使用更小的图形尺寸和DPI来节省内存
        fig, axes = plt.subplots(2, 2, figsize=(10, 8), dpi=100)
        fig.suptitle('Statistical Overview', fontsize=14)
        # 基本统计信息
        # 只处理前4个变量以节省内存
        num_vars = min(4, len(self.data.columns))
        for i in range(num_vars):
            row = i // 2
            col = i % 2
            col_name = self.data.columns[i]
            try:
                # 时间序列图
                axes[row, col].plot(self.data.index, self.data[col_name], linewidth=1)
                axes[row, col].set_title(f'{col_name}')
                axes[row, col].tick_params(axis='x', rotation=45)
                axes[row, col].grid(True, alpha=0.3)
            except Exception as e:
                self._log_step(f"Plotting {col_name} failed: {e}", "warning")
                axes[row, col].text(
                    0.5,
                    0.5,
                    f'Error: {str(e)[:30]}',
                    ha='center',
                    va='center',
                    transform=axes[row, col].transAxes,
                )
        plt.tight_layout()
        # 保存图片（使用更低的DPI）
        img_path = os.path.join(self.temp_dir.name, 'stats_overview.png')
        try:
            plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
            if not os.path.exists(img_path):
                self._log_step("Failed to save statistical overview image", "error")
                return None, "Failed to save image", stats_df
        except Exception as save_error:
            self._log_step(f"Failed to save figure: {save_error}", "error")
            return None, f"Save error: {str(save_error)[:100]}", stats_df
        finally:
            plt.close(fig)  # 明确关闭图形释放内存
            gc.collect()
        self._log_step("Statistical overview generated", "success")
        return img_path, summary, stats_df
    except Exception as e:
        self._log_step(f"Statistical overview failed: {str(e)[:100]}", "error")
        if fig is not None:
            try:
                plt.close(fig)
                gc.collect()
            except Exception:
                pass
        return None, f"Statistical overview failed: {str(e)[:100]}", None
 def perform_normality_tests(self):
    """执行正态性检验"""
    try:
        self._log_step("Performing normality tests...")
        if hasattr(self, 'data') and self.data is not None:
            numeric_cols = self.data.select_dtypes(include=[np.number]).columns
            results = {}
            for col in numeric_cols[:3]:  # 只测试前3个变量
                series = self.data[col].dropna()
                col_results = {}
                # 直方图分箱（后端负责 binning）
                hist_counts, bin_edges = np.histogram(series, bins=20)
                histogram = []
                for i in range(len(hist_counts)):
                    histogram.append({
                        'range_start': float(bin_edges[i]),
                        'range_end': float(bin_edges[i + 1]),
                        'count': int(hist_counts[i])
                    })
                col_results['histogram'] = histogram
                # Shapiro-Wilk检验
                if len(series) >= 3 and len(series) <= 5000:
                    shapiro_result = stats.shapiro(series)
                    col_results['Shapiro-Wilk'] = {
                        'statistic': float(shapiro_result[0]),
                        'p_value': float(shapiro_result[1]),
                        'normal': bool(shapiro_result[1] > 0.05),
                    }
                # Jarque-Bera检验
                jb_result = stats.jarque_bera(series)
                # SciPy result typing varies by version; keep runtime behavior and silence stub mismatch.
                jb_stat = float(jb_result[0])  # type: ignore[index,arg-type]
                jb_p = float(jb_result[1])  # type: ignore[index,arg-type]
                col_results['Jarque-Bera'] = {
                    'statistic': jb_stat,
                    'p_value': jb_p,
                    'normal': bool(jb_p > 0.05),
                }
                results[col] = col_results
            summary = f"正态性检验完成，测试了 {len(results)} 个变量"
            if not self.generate_plots:
                self._log_step("Normality tests completed (data only)", "success")
                return None, summary, results
            # 创建正态性检验可视化
            n_cols = min(3, len(numeric_cols))
            fig, axes = plt.subplots(n_cols, 2, figsize=(12, 4 * n_cols))
            fig.suptitle('正态性检验结果', fontsize=16)
            if n_cols == 1:
                axes = axes.reshape(1, -1)
            for i, col in enumerate(numeric_cols[:n_cols]):
                series = self.data[col].dropna()
                # 直方图与正态曲线
                axes[i, 0].hist(series, bins=20, density=True, alpha=0.7, color='skyblue')
                xmin, xmax = axes[i, 0].get_xlim()
                x = np.linspace(xmin, xmax, 100)
                p = stats.norm.pdf(x, series.mean(), series.std())
                axes[i, 0].plot(x, p, 'k', linewidth=2)
                axes[i, 0].set_title(f'{col} - 分布直方图')
                # Q-Q图
                stats.probplot(series, dist="norm", plot=axes[i, 1])
                axes[i, 1].set_title(f'{col} - Q-Q图')
            plt.tight_layout()
            img_path = os.path.join(self.temp_dir.name, 'normality_tests.png')
            plt.savefig(img_path, dpi=150, bbox_inches='tight')
            plt.close()
            self._log_step("Normality tests completed", "success")
            return img_path, summary, results
        self._log_step("No data available for normality tests", "warning")
        return None, "数据不足，无法进行正态性检验", None
    except Exception as e:
        self._log_step(f"Normality tests failed: {e}", "error")
        return None, f"正态性检验失败: {e}", None
--- a/app/services/analysis/modules/modeling.py
+++ b/app/services/analysis/modules/modeling.py
@ -0,0 +1,112 @@
 import os
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from sklearn.ensemble import RandomForestRegressor
 def analyze_feature_importance(self):
    """分析特征重要性"""
    try:
        self._log_step("Analyzing feature importance...")
        if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
            self._log_step("Not enough data for feature importance analysis", "warning")
            return None, "Not enough data for feature importance analysis", None
        X = self.data
        y = self.data.iloc[:, 0]  # 使用第一列作为目标变量
        model = RandomForestRegressor(n_estimators=50, random_state=42)  # 减少树的数量
        model.fit(X, y)
        feature_importance = pd.Series(model.feature_importances_, index=X.columns)
        feature_importance = feature_importance.sort_values(ascending=False)
        fi_df = feature_importance.reset_index()
        fi_df.columns = ['feature', 'importance']
        summary = f"Feature importance analysis completed, top feature: {fi_df.iloc[0]['feature']}"
        if not self.generate_plots:
            self._log_step("Feature importance analysis completed (data only)", "success")
            return None, summary, fi_df
        plt.figure(figsize=(8, 6))
        feature_importance.head(10).plot(kind='bar')
        plt.title('Feature Importance Analysis')
        plt.ylabel('Importance Score')
        plt.tight_layout()
        img_path = os.path.join(self.temp_dir.name, 'feature_importance.png')
        plt.savefig(img_path, dpi=150, bbox_inches='tight')
        plt.close()
        self._log_step("Feature importance analysis completed", "success")
        return img_path, summary, fi_df
    except Exception as e:
        self._log_step(f"Feature importance analysis failed: {e}", "error")
        return None, f"Feature importance analysis failed: {e}", None
 def perform_var_analysis(self):
    """执行向量自回归分析"""
    try:
        self._log_step("Performing VAR analysis...")
        if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
            self._log_step("Not enough data for VAR analysis", "warning")
            return None, "数据不足，无法进行VAR分析", None
        from statsmodels.tsa.api import VAR
        numeric_data = self.data.select_dtypes(include=[np.number])
        if len(numeric_data.columns) < 2:
            self._log_step("Not enough numeric columns for VAR analysis", "warning")
            return None, "数值变量不足，无法进行VAR分析", None
        var_data = numeric_data.iloc[:, : min(3, len(numeric_data.columns))]
        model = VAR(var_data)
        results = model.fit(maxlags=2, ic='aic')
        lag_order = results.k_ar
        forecast = results.forecast(var_data.values[-lag_order:], steps=10)
        forecast_df = pd.DataFrame(data=forecast, columns=[f"{col}_forecast" for col in var_data.columns])
        summary = f"VAR分析完成，使用滞后阶数: {results.k_ar}，生成了10期预测"
        if not self.generate_plots:
            self._log_step("VAR analysis completed (data only)", "success")
            return None, summary, forecast_df
        plt.figure(figsize=(12, 8))
        for i, col in enumerate(var_data.columns):
            plt.plot(range(len(var_data)), var_data[col].values, label=f'{col} (actual)', alpha=0.7)
            plt.plot(
                range(len(var_data), len(var_data) + 10),
                forecast[:, i],
                label=f'{col} (forecast)',
                linestyle='--',
            )
        plt.axvline(x=len(var_data), color='red', linestyle=':', alpha=0.7, label='Forecast Start')
        plt.xlabel('Time')
        plt.ylabel('Value')
        plt.title('Vector Autoregression (VAR) Forecast')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        img_path = os.path.join(self.temp_dir.name, 'var_analysis.png')
        plt.savefig(img_path, dpi=150, bbox_inches='tight')
        plt.close()
        self._log_step("VAR analysis completed", "success")
        return img_path, summary, forecast_df
    except Exception as e:
        self._log_step(f"VAR analysis failed: {e}", "error")
        return None, f"VAR分析失败: {e}", None
--- a/app/services/analysis/modules/multivariate.py
+++ b/app/services/analysis/modules/multivariate.py
@ -0,0 +1,301 @@
 import gc
 import os
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 import seaborn as sns
 from sklearn.decomposition import PCA
 from sklearn.cluster import KMeans
 def generate_correlation_heatmap(self):
    """生成相关性热力图"""
    fig = None
    try:
        self._log_step("Generating correlation heatmap...")
        if not hasattr(self, 'data') or self.data is None or len(self.data.columns) <= 1:
            self._log_step("Not enough data for correlation analysis", "warning")
            return None, "Not enough data", None
        # 计算相关性矩阵
        numeric_cols = self.data.select_dtypes(include=[np.number]).columns
        corr_matrix = self.data[numeric_cols].corr()
        summary = "Correlation matrix calculated"
        if not self.generate_plots:
            self._log_step("Correlation analysis completed (data only)", "success")
            # 替换NaN为None以兼容JSON
            return None, summary, corr_matrix.where(pd.notnull(corr_matrix), None)
        # 创建热力图
        fig = plt.figure(figsize=(8, 6), dpi=100)
        sns.heatmap(
            corr_matrix,
            annot=True,
            fmt=".2f",
            cmap='coolwarm',
            center=0,
            square=True,
            cbar_kws={"shrink": 0.8},
        )
        plt.title('Correlation Heatmap')
        plt.tight_layout()
        # 保存图片
        img_path = os.path.join(self.temp_dir.name, 'correlation_heatmap.png')
        try:
            plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
        except Exception as save_err:
            self._log_step(f"Save error: {save_err}", "error")
            return None, f"Save error: {str(save_err)[:100]}", corr_matrix.where(pd.notnull(corr_matrix), None)
        finally:
            plt.close(fig)
            gc.collect()
        self._log_step("Correlation heatmap generated", "success")
        return img_path, summary, corr_matrix.where(pd.notnull(corr_matrix), None)
    except Exception as e:
        self._log_step(f"Correlation heatmap failed: {str(e)[:100]}", "error")
        if fig is not None:
            try:
                plt.close(fig)
            except Exception:
                pass
        return None, f"Correlation heatmap failed: {str(e)[:100]}", None
 def generate_pca_scree_plot(self):
    """生成PCA碎石图"""
    try:
        self._log_step("Generating PCA scree plot...")
        if hasattr(self, 'scaled_data') and self.scaled_data is not None:
            pca = PCA()
            pca.fit(self.scaled_data)
            explained_variance = pca.explained_variance_ratio_
            cumulative_variance = np.cumsum(explained_variance)
            # 准备数据
            scree_data = pd.DataFrame({
                'component': range(1, len(explained_variance) + 1),
                'explained_variance': explained_variance,
                'cumulative_variance': cumulative_variance,
            })
            summary = (
                "PCA碎石图生成完成，前2个主成分解释 "
                f"{cumulative_variance[min(1, len(cumulative_variance) - 1)]:.2%} 方差"
            )
            if not self.generate_plots:
                self._log_step("PCA scree data generated", "success")
                return None, summary, scree_data
            # 创建碎石图
            plt.figure(figsize=(10, 6))
            # 绘制碎石图
            plt.subplot(1, 2, 1)
            plt.plot(range(1, len(explained_variance) + 1), explained_variance, 'bo-')
            plt.title('PCA碎石图')
            plt.xlabel('主成分')
            plt.ylabel('解释方差比例')
            plt.grid(True, alpha=0.3)
            # 绘制累积方差图
            plt.subplot(1, 2, 2)
            plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, 'ro-')
            plt.title('累积解释方差')
            plt.xlabel('主成分数量')
            plt.ylabel('累积方差比例')
            plt.axhline(y=0.85, color='g', linestyle='--', label='85% 方差')
            plt.legend()
            plt.grid(True, alpha=0.3)
            plt.tight_layout()
            img_path = os.path.join(self.temp_dir.name, 'pca_scree_plot.png')
            plt.savefig(img_path, dpi=150, bbox_inches='tight')
            plt.close()
            self._log_step("PCA scree plot generated", "success")
            return img_path, summary, scree_data
        self._log_step("No scaled data available for PCA scree plot", "warning")
        return None, "没有标准化数据可用于PCA碎石图", None
    except Exception as e:
        self._log_step(f"PCA scree plot failed: {e}", "error")
        return None, f"PCA碎石图生成失败: {e}", None
 def perform_pca_analysis(self):
    """执行主成分分析"""
    try:
        self._log_step("Performing PCA analysis...")
        if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
            pca = PCA(n_components=2)
            principal_components = pca.fit_transform(self.scaled_data)
            summary = (
                "PCA analysis completed, explained variance: "
                f"{pca.explained_variance_ratio_[0]:.2%} + {pca.explained_variance_ratio_[1]:.2%}"
            )
            if not self.generate_plots:
                self._log_step("PCA analysis completed (data only)", "success")
                pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
                pca_df['timestamp'] = self.data.index.astype(str)
                return None, summary, pca_df
            pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
            pca_df['timestamp'] = self.data.index.astype(str)
            # 创建PCA散点图
            plt.figure(figsize=(8, 6))
            plt.scatter(principal_components[:, 0], principal_components[:, 1], alpha=0.7)
            plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%})')
            plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%})')
            plt.title('Principal Component Analysis (PCA)')
            plt.grid(True, alpha=0.3)
            plt.tight_layout()
            # 保存图片
            img_path = os.path.join(self.temp_dir.name, 'pca_analysis.png')
            plt.savefig(img_path, dpi=150, bbox_inches='tight')
            plt.close()
            self._log_step("PCA analysis completed", "success")
            return img_path, summary, pca_df
        self._log_step("Not enough data for PCA analysis", "warning")
        return None, "Not enough data for PCA analysis", None
    except Exception as e:
        self._log_step(f"PCA analysis failed: {e}", "error")
        return None, f"PCA analysis failed: {e}", None
 def perform_clustering_analysis(self):
    """执行聚类分析"""
    try:
        self._log_step("Performing clustering analysis...")
        if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
            kmeans = KMeans(n_clusters=3, random_state=42)
            clusters = kmeans.fit_predict(self.scaled_data)
            summary = f"Clustering analysis completed, found {len(np.unique(clusters))} clusters"
            if not self.generate_plots:
                self._log_step("Clustering analysis completed (data only)", "success")
                cluster_df = pd.DataFrame({'cluster': clusters})
                cluster_df['timestamp'] = self.data.index.astype(str)
                return None, summary, cluster_df
            cluster_df = pd.DataFrame({'cluster': clusters})
            cluster_df['timestamp'] = self.data.index.astype(str)
            # 如果数据是2D的，直接绘制聚类结果
            if len(self.scaled_data.columns) >= 2:
                plt.figure(figsize=(8, 6))
                plt.scatter(
                    self.scaled_data.iloc[:, 0],
                    self.scaled_data.iloc[:, 1],
                    c=clusters,
                    cmap='viridis',
                    alpha=0.7,
                )
                plt.xlabel(self.scaled_data.columns[0])
                plt.ylabel(self.scaled_data.columns[1])
                plt.title('Clustering Analysis')
                plt.colorbar(label='Cluster')
                plt.tight_layout()
            else:
                # 对于高维数据，使用PCA降维后可视化
                pca = PCA(n_components=2)
                reduced_data = pca.fit_transform(self.scaled_data)
                plt.figure(figsize=(8, 6))
                plt.scatter(reduced_data[:, 0], reduced_data[:, 1], c=clusters, cmap='viridis', alpha=0.7)
                plt.xlabel('PC1')
                plt.ylabel('PC2')
                plt.title('Clustering Analysis (PCA Reduced)')
                plt.colorbar(label='Cluster')
                plt.tight_layout()
            # 保存图片
            img_path = os.path.join(self.temp_dir.name, 'clustering_analysis.png')
            plt.savefig(img_path, dpi=150, bbox_inches='tight')
            plt.close()
            self._log_step("Clustering analysis completed", "success")
            return img_path, summary, cluster_df
        self._log_step("Not enough data for clustering analysis", "warning")
        return None, "Not enough data for clustering analysis", None
    except Exception as e:
        self._log_step(f"Clustering analysis failed: {e}", "error")
        return None, f"Clustering analysis failed: {e}", None
 def perform_factor_analysis(self):
    """执行因子分析"""
    try:
        self._log_step("Performing factor analysis...")
        if hasattr(self, 'scaled_data') and self.scaled_data is not None and len(self.scaled_data.columns) > 1:
            from sklearn.decomposition import FactorAnalysis
            fa = FactorAnalysis(n_components=2, random_state=42)
            factors = fa.fit_transform(self.scaled_data)
            summary = "因子分析完成，提取了2个主要因子"
            if not self.generate_plots:
                self._log_step("Factor analysis completed (data only)", "success")
                factor_df = pd.DataFrame(data=factors, columns=['Factor1', 'Factor2'])
                factor_df['timestamp'] = self.data.index.astype(str)
                return None, summary, factor_df
            factor_df = pd.DataFrame(data=factors, columns=['Factor1', 'Factor2'])
            factor_df['timestamp'] = self.data.index.astype(str)
            # 创建因子分析图
            plt.figure(figsize=(10, 8))
            plt.scatter(factors[:, 0], factors[:, 1], alpha=0.7)
            plt.xlabel('Factor 1')
            plt.ylabel('Factor 2')
            plt.title('Factor Analysis')
            plt.grid(True, alpha=0.3)
            # 添加因子载荷
            for i, (x, y) in enumerate(factors[:10]):  # 只显示前10个点
                plt.annotate(str(i), (x, y), xytext=(5, 5), textcoords='offset points', fontsize=8)
            plt.tight_layout()
            # 保存图片
            img_path = os.path.join(self.temp_dir.name, 'factor_analysis.png')
            plt.savefig(img_path, dpi=150, bbox_inches='tight')
            plt.close()
            self._log_step("Factor analysis completed", "success")
            return img_path, summary, factor_df
        self._log_step("Not enough data for factor analysis", "warning")
        return None, "数据不足，无法进行因子分析", None
    except Exception as e:
        self._log_step(f"Factor analysis failed: {e}", "error")
        return None, f"因子分析失败: {e}", None
--- a/app/services/analysis/modules/stationarity.py
+++ b/app/services/analysis/modules/stationarity.py
@ -0,0 +1,169 @@
 import os
 import numpy as np
 import matplotlib.pyplot as plt
 from statsmodels.tsa.stattools import adfuller, kpss
 def perform_stationarity_tests(self):
    """执行平稳性检验 - ADF, KPSS, PP检验"""
    try:
        self._log_step("Performing stationarity tests...")
        if hasattr(self, 'data') and self.data is not None:
            numeric_cols = self.data.select_dtypes(include=[np.number]).columns
            results = {}
            for col in numeric_cols[:3]:  # 只测试前3个变量
                series = self.data[col].dropna()
                col_results = {}
                # ADF检验
                adf_result = adfuller(series)
                adf_crit = adf_result[4]  # type: ignore[index]
                if isinstance(adf_crit, dict):
                    adf_crit = {str(k): float(v) for k, v in adf_crit.items()}
                col_results['ADF'] = {
                    'statistic': float(adf_result[0]),
                    'p_value': float(adf_result[1]),
                    'critical_values': adf_crit,
                    'stationary': bool(adf_result[1] < 0.05),
                }
                # KPSS检验
                try:
                    kpss_result = kpss(series, regression='c')
                    kpss_crit = kpss_result[3]
                    if isinstance(kpss_crit, dict):
                        kpss_crit = {str(k): float(v) for k, v in kpss_crit.items()}
                    col_results['KPSS'] = {
                        'statistic': float(kpss_result[0]),
                        'p_value': float(kpss_result[1]),
                        'critical_values': kpss_crit,
                        'stationary': bool(kpss_result[1] > 0.05),
                    }
                except Exception:
                    col_results['KPSS'] = '检验失败'
                results[col] = col_results
            summary = f"平稳性检验完成，测试了 {len(results)} 个变量"
            if not self.generate_plots:
                self._log_step("Stationarity tests completed (data only)", "success")
                return None, summary, results
            # 创建平稳性检验可视化
            fig, axes = plt.subplots(2, 2, figsize=(15, 10))
            fig.suptitle('平稳性检验结果', fontsize=16)
            # 绘制时间序列
            for i, col in enumerate(numeric_cols[:2]):
                axes[0, i].plot(self.data.index, self.data[col])
                axes[0, i].set_title(f'{col} - 时间序列')
                axes[0, i].tick_params(axis='x', rotation=45)
                axes[0, i].grid(True, alpha=0.3)
            # 绘制ADF检验结果
            test_stats = [results[col]['ADF']['statistic'] for col in list(results.keys())[:2]]
            p_values = [results[col]['ADF']['p_value'] for col in list(results.keys())[:2]]
            x_pos = np.arange(len(test_stats))
            axes[1, 0].bar(x_pos - 0.2, test_stats, 0.4, label='检验统计量', alpha=0.7)
            axes[1, 0].bar(x_pos + 0.2, p_values, 0.4, label='p值', alpha=0.7)
            axes[1, 0].set_title('ADF检验结果')
            axes[1, 0].set_xticks(x_pos)
            axes[1, 0].set_xticklabels(list(results.keys())[:2])
            axes[1, 0].legend()
            axes[1, 0].axhline(y=0.05, color='r', linestyle='--', label='显著性水平 (0.05)')
            # 绘制结论
            stationary_status = [
                '平稳' if results[col]['ADF']['stationary'] else '非平稳' for col in list(results.keys())[:2]
            ]
            colors = ['green' if status == '平稳' else 'red' for status in stationary_status]
            axes[1, 1].bar(x_pos, [1] * len(stationary_status), color=colors, alpha=0.7)
            axes[1, 1].set_title('平稳性结论')
            axes[1, 1].set_xticks(x_pos)
            axes[1, 1].set_xticklabels(list(results.keys())[:2])
            for i, status in enumerate(stationary_status):
                axes[1, 1].text(i, 0.5, status, ha='center', va='center', fontweight='bold')
            plt.tight_layout()
            img_path = os.path.join(self.temp_dir.name, 'stationarity_tests.png')
            plt.savefig(img_path, dpi=150, bbox_inches='tight')
            plt.close()
            self._log_step("Stationarity tests completed", "success")
            return img_path, summary, results
        self._log_step("No data available for stationarity tests", "warning")
        return None, "数据不足，无法进行平稳性检验", None
    except Exception as e:
        self._log_step(f"Stationarity tests failed: {e}", "error")
        return None, f"平稳性检验失败: {e}", None
 def perform_cointegration_test(self):
    """执行协整检验"""
    try:
        self._log_step("Performing cointegration test...")
        if not (hasattr(self, 'data') and self.data is not None and len(self.data.columns) > 1):
            self._log_step("Not enough data for cointegration test", "warning")
            return None, "数据不足，无法进行协整检验", None
        from statsmodels.tsa.vector_ar.vecm import coint_johansen
        numeric_data = self.data.select_dtypes(include=[np.number])
        if len(numeric_data.columns) < 2:
            self._log_step("Not enough numeric columns for cointegration test", "warning")
            return None, "数值变量不足，无法进行协整检验", None
        result = coint_johansen(numeric_data, det_order=0, k_ar_diff=1)
        summary = (
            f"协整检验完成，轨迹统计量: {result.trace_stat[0]:.3f}, "
            f"临界值(95%): {result.trace_stat_crit_vals[0, 1]:.3f}"
        )
        coint_data = {
            'trace_stat': result.trace_stat.tolist(),
            'trace_stat_crit_vals': result.trace_stat_crit_vals.tolist(),
            'eigen_vals': result.eig.tolist(),
        }
        if not self.generate_plots:
            self._log_step("Cointegration test completed (data only)", "success")
            return None, summary, coint_data
        plt.figure(figsize=(10, 6))
        positions = np.arange(len(result.trace_stat))
        plt.bar(positions - 0.2, result.trace_stat, width=0.4, label='Trace Statistic', alpha=0.7)
        plt.bar(
            positions + 0.2,
            result.trace_stat_crit_vals[:, 1],
            width=0.4,
            label='Critical Value (95%)',
            alpha=0.7,
        )
        plt.xlabel('Number of Cointegrating Relations')
        plt.ylabel('Test Statistic')
        plt.title('Johansen Cointegration Test Results')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        img_path = os.path.join(self.temp_dir.name, 'cointegration_test.png')
        plt.savefig(img_path, dpi=150, bbox_inches='tight')
        plt.close()
        self._log_step("Cointegration test completed", "success")
        return img_path, summary, coint_data
    except Exception as e:
        self._log_step(f"Cointegration test failed: {e}", "error")
        return None, f"协整检验失败: {e}", None
--- a/app/services/analysis/modules/time_series.py
+++ b/app/services/analysis/modules/time_series.py
@ -0,0 +1,242 @@
 import gc
 import os
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from statsmodels.tsa.stattools import acf, pacf
 from statsmodels.tsa.seasonal import seasonal_decompose
 from scipy.signal import spectrogram, periodogram
 def generate_time_series_plots(self):
    """生成时间序列图"""
    try:
        self._log_step("Generating time series plots...")
        if not hasattr(self, 'data') or self.data is None or len(self.data.columns) == 0:
            self._log_step("No data available for time series plots", "warning")
            return None, "No data available", None
        # 准备数据
        n_plots = min(4, len(self.data.columns))
        plot_data = self.data.iloc[:, :n_plots].reset_index()
        # 将 timestamp 转为字符串，确保JSON可序列化
        if 'timestamp' in plot_data.columns:
            plot_data['timestamp'] = plot_data['timestamp'].astype(str)
        summary = f"Generated {n_plots} time series charts"
        # charts 模式：仅返回数据，不生成图片；保留绘图版在下方注释
        self._log_step("Time series data prepared", "success")
        return None, summary, plot_data
        # --- 绘图版保留参考 ---
        # fig, axes = plt.subplots(2, 2, figsize=(10, 8), dpi=100)
        # fig.suptitle('Time Series Analysis', fontsize=14)
        # axes = axes.flatten()
        # for i in range(n_plots):
        #     try:
        #         col = self.data.columns[i]
        #         axes[i].plot(self.data.index, self.data[col], linewidth=1)
        #         axes[i].set_title(f'{col}')
        #         axes[i].tick_params(axis='x', rotation=45)
        #         axes[i].grid(True, alpha=0.3)
        #     except Exception as plot_err:
        #         self._log_step(f"Plot {col} error: {plot_err}", "warning")
        # for i in range(n_plots, len(axes)):
        #     fig.delaxes(axes[i])
        # plt.tight_layout()
        # img_path = os.path.join(self.temp_dir.name, 'time_series.png')
        # plt.savefig(img_path, dpi=100, bbox_inches='tight', format='png')
        # plt.close(fig)
        # self._log_step("Time series plots generated", "success")
        # return img_path, summary, plot_data
    except Exception as e:
        self._log_step(f"Time series plots failed: {str(e)[:100]}", "error")
        return None, f"Error: {e}", None
 def generate_acf_pacf_plots(self):
    """生成自相关和偏自相关图"""
    try:
        self._log_step("Generating ACF and PACF plots...")
        if hasattr(self, 'data') and self.data is not None:
            numeric_cols = self.data.select_dtypes(include=[np.number]).columns
            n_cols = min(3, len(numeric_cols))
            # 计算ACF和PACF数据
            acf_pacf_results = {}
            for col in numeric_cols[:n_cols]:
                series = self.data[col].dropna()
                try:
                    acf_vals = np.asarray(acf(series, nlags=min(40, len(series) // 4)))
                    pacf_vals = np.asarray(pacf(series, nlags=min(20, len(series) // 5)))
                    acf_pacf_results[col] = {
                        'acf': acf_vals.tolist(),
                        'pacf': pacf_vals.tolist(),
                    }
                except Exception as e:
                    self._log_step(f"Error calculating ACF/PACF for {col}: {e}", "warning")
            summary = f"生成 {n_cols} 个变量的ACF和PACF数据"
            self._log_step("ACF and PACF data generated", "success")
            return None, summary, acf_pacf_results
            # --- 绘图版保留参考 ---
            # fig, axes = plt.subplots(n_cols, 2, figsize=(12, 4 * n_cols))
            # fig.suptitle('自相关和偏自相关分析', fontsize=16)
            # if n_cols == 1:
            #     axes = axes.reshape(1, -1)
            # for i, col in enumerate(numeric_cols[:n_cols]):
            #     series = self.data[col].dropna()
            #     plot_acf(series, ax=axes[i, 0], lags=min(40, len(series) // 4))
            #     axes[i, 0].set_title(f'{col} - 自相关函数 (ACF)')
            #     plot_pacf(series, ax=axes[i, 1], lags=min(20, len(series) // 5))
            #     axes[i, 1].set_title(f'{col} - 偏自相关函数 (PACF)')
            # plt.tight_layout()
            # img_path = os.path.join(self.temp_dir.name, 'acf_pacf_plots.png')
            # plt.savefig(img_path, dpi=150, bbox_inches='tight')
            # plt.close()
            # self._log_step("ACF and PACF plots generated", "success")
            # return img_path, f"生成 {n_cols} 个变量的ACF和PACF图", acf_pacf_results
        self._log_step("No data available for ACF/PACF plots", "warning")
        return None, "数据不足，无法生成ACF/PACF图", None
    except Exception as e:
        self._log_step(f"ACF/PACF plots failed: {e}", "error")
        return None, f"ACF/PACF图生成失败: {e}", None
 def perform_seasonal_decomposition(self):
    """执行季节性分解"""
    try:
        self._log_step("Performing seasonal decomposition...")
        if hasattr(self, 'data') and self.data is not None:
            numeric_cols = self.data.select_dtypes(include=[np.number]).columns
            # 选择第一个数值列进行分解
            if len(numeric_cols) > 0:
                col = numeric_cols[0]
                series = self.data[col].dropna()
                # 季节性分解
                result = seasonal_decompose(series, model='additive', period=min(24, len(series) // 2))
                decomposition_data = pd.DataFrame({
                    'observed': result.observed,
                    'trend': result.trend,
                    'seasonal': result.seasonal,
                    'resid': result.resid,
                })
                # 填充NaN以确保JSON序列化
                decomposition_data = decomposition_data.astype(object).where(
                    pd.notnull(decomposition_data),
                    None,  # type: ignore[arg-type]
                )
                summary = f"季节性分解完成，变量: {col}"
                self._log_step("Seasonal decomposition completed (data only)", "success")
                return None, summary, decomposition_data
                # --- 绘图版保留参考 ---
                # fig, axes = plt.subplots(4, 1, figsize=(12, 10))
                # fig.suptitle(f'{col} - 季节性分解', fontsize=16)
                # result.observed.plot(ax=axes[0], title='原始序列')
                # result.trend.plot(ax=axes[1], title='趋势成分')
                # result.seasonal.plot(ax=axes[2], title='季节成分')
                # result.resid.plot(ax=axes[3], title='残差成分')
                # for ax in axes:
                #     ax.tick_params(axis='x', rotation=45)
                #     ax.grid(True, alpha=0.3)
                # plt.tight_layout()
                # img_path = os.path.join(self.temp_dir.name, 'seasonal_decomposition.png')
                # plt.savefig(img_path, dpi=150, bbox_inches='tight')
                # plt.close()
                # self._log_step("Seasonal decomposition completed", "success")
                # return img_path, summary, decomposition_data
            self._log_step("No numeric columns for decomposition", "warning")
            return None, "没有数值列可用于季节性分解", None
        self._log_step("No data available for seasonal decomposition", "warning")
        return None, "数据不足，无法进行季节性分解", None
    except Exception as e:
        self._log_step(f"Seasonal decomposition failed: {e}", "error")
        return None, f"季节性分解失败: {e}", None
 def perform_spectral_analysis(self):
    """执行频谱分析"""
    try:
        self._log_step("Performing spectral analysis...")
        if hasattr(self, 'data') and self.data is not None:
            numeric_cols = self.data.select_dtypes(include=[np.number]).columns
            # 计算频谱数据（简化输出，避免数据量过大）
            spectral_results = {}
            for col in numeric_cols[:2]:
                try:
                    series = self.data[col].dropna().values
                    f, t, Sxx = spectrogram(series, fs=1.0, nperseg=min(256, len(series) // 4))
                    f_p, Pxx_den = periodogram(series, fs=1.0)
                    # 仅保留频谱的均值和形状，避免返回完整矩阵
                    Sxx_log = 10 * np.log10(Sxx + 1e-12)
                    spectral_results[col] = {
                        'spectrogram': {
                            'f': f.tolist(),
                            't': t.tolist(),
                            'Sxx_log10_mean': float(np.mean(Sxx_log)),
                            'Sxx_shape': Sxx.shape,
                        },
                        'periodogram': {
                            'f': f_p.tolist()[:20],
                            'Pxx_den': Pxx_den.tolist()[:20],
                        },
                    }
                except Exception as e:
                    self._log_step(f"Spectral calc failed for {col}: {e}", "warning")
            summary = "Spectral analysis completed"
            self._log_step("Spectral analysis completed (data only)", "success")
            return None, summary, spectral_results
            # --- 绘图版保留参考 ---
            # n_cols = min(2, len(numeric_cols))
            # fig, axes = plt.subplots(n_cols, 2, figsize=(15, 5 * n_cols))
            # fig.suptitle('频谱分析', fontsize=16)
            # if n_cols == 1:
            #     axes = axes.reshape(1, -1)
            # for i, col in enumerate(numeric_cols[:n_cols]):
            #     series = self.data[col].dropna().values
            #     f, t, Sxx = spectrogram(series, fs=1.0, nperseg=min(256, len(series) // 4))
            #     axes[i, 0].pcolormesh(t, f, 10 * np.log10(Sxx), shading='gouraud')
            #     axes[i, 0].set_title(f'{col} - 频谱图')
            #     axes[i, 0].set_ylabel('频率 [Hz]')
            #     axes[i, 0].set_xlabel('时间')
            #     f, Pxx_den = periodogram(series, fs=1.0)
            #     axes[i, 1].semilogy(f, Pxx_den)
            #     axes[i, 1].set_title(f'{col} - 周期图')
            #     axes[i, 1].set_xlabel('频率 [Hz]')
            #     axes[i, 1].set_ylabel('PSD [V**2/Hz]')
            #     axes[i, 1].grid(True, alpha=0.3)
            # plt.tight_layout()
            # img_path = os.path.join(self.temp_dir.name, 'spectral_analysis.png')
            # plt.savefig(img_path, dpi=150, bbox_inches='tight')
            # plt.close()
            # self._log_step("Spectral analysis completed", "success")
            # return img_path, f"频谱分析完成，分析了 {n_cols} 个变量", spectral_results
        self._log_step("No data available for spectral analysis", "warning")
        return None, "数据不足，无法进行频谱分析", None
    except Exception as e:
        self._log_step(f"Spectral analysis failed: {e}", "error")
        return None, f"频谱分析失败: {e}", None
--- a/app/services/analysis_system.py
+++ b/app/services/analysis_system.py
--- a/app/services/font_manager.py
+++ b/app/services/font_manager.py
@ -0,0 +1,249 @@
 """
 字体管理模块 - 支持跨平台字体检测和配置
 支持 Linux、macOS、Windows 三个平台
 """
 import os
 import sys
 import logging
 from pathlib import Path
 from typing import Optional, List, Dict
 import matplotlib.pyplot as plt
 import matplotlib.font_manager as fm
 from app.core.config import settings
 logger = logging.getLogger(__name__)
 class FontManager:
    """字体管理器 - 处理跨平台字体检测和配置"""
    # 支持的字体路径映射（按优先级排序）
    FONT_PATHS = {
        'zh': {  # 中文字体
            'linux': [
                '/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc',
                '/usr/share/fonts/truetype/wqy/wqy-microhei.ttc',
                '/usr/share/fonts/truetype/liberation/LiberationSerif-Regular.ttf',
                '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf',
            ],
            'darwin': [  # macOS
                '/Library/Fonts/SimHei.ttf',
                '/System/Library/Fonts/STHeiti Light.ttc',
                '/Applications/Microsoft Office/Library/Fonts/SimSun.ttf',
                '/Library/Fonts/Arial.ttf',
            ],
            'win32': [
                'C:\\Windows\\Fonts\\simhei.ttf',
                'C:\\Windows\\Fonts\\simsun.ttc',
                'C:\\Windows\\Fonts\\msyh.ttc',
                'C:\\Windows\\Fonts\\arial.ttf',
            ]
        },
        'en': {  # 英文字体
            'linux': [
                '/usr/share/fonts/truetype/liberation/LiberationSerif-Regular.ttf',
                '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf',
                '/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf',
            ],
            'darwin': [
                '/Library/Fonts/Times New Roman.ttf',
                '/Library/Fonts/Arial.ttf',
                '/System/Library/Fonts/Helvetica.ttc',
            ],
            'win32': [
                'C:\\Windows\\Fonts\\times.ttf',
                'C:\\Windows\\Fonts\\arial.ttf',
                'C:\\Windows\\Fonts\\georgia.ttf',
            ]
        }
    }
    # 项目内置字体
    PROJECT_FONTS = {
        'zh_regular': 'SubsetOTF/CN/SourceHanSansCN-Regular.otf',
        'zh_bold': 'SubsetOTF/CN/SourceHanSansCN-Bold.otf',
        'en_regular': None,  # 英文使用系统字体
    }
    def __init__(self, fonts_dir: Optional[Path] = None):
        """
        初始化字体管理器
        Args:
            fonts_dir: 项目字体目录路径
        """
        self.fonts_dir = fonts_dir or settings.FONTS_DIR
        self.platform = sys.platform
        self.available_fonts = {}
        self._init_fonts()
    def _init_fonts(self):
        """初始化字体系统"""
        logger.info(f"初始化字体系统 (平台: {self.platform})")
        # 扫描系统和项目字体
        self._scan_system_fonts()
        self._register_project_fonts()
    def _scan_system_fonts(self):
        """扫描系统可用字体"""
        logger.info("扫描系统字体...")
        for lang, fonts in self.FONT_PATHS.items():
            paths = fonts.get(self.platform, [])
            for font_path in paths:
                if os.path.exists(font_path):
                    self.available_fonts[lang] = font_path
                    logger.info(f"找到{lang}字体: {font_path}")
                    break
            if lang not in self.available_fonts:
                logger.warning(f"未找到系统{lang}字体")
    def _register_project_fonts(self):
        """注册项目内置字体"""
        logger.info(f"扫描项目字体目录: {self.fonts_dir}")
        # 注册中文字体
        zh_font_path = self.fonts_dir / self.PROJECT_FONTS['zh_regular']
        if zh_font_path.exists():
            try:
                self.available_fonts['zh'] = str(zh_font_path)
                logger.info(f"注册项目中文字体: {zh_font_path}")
            except Exception as e:
                logger.warning(f"注册项目中文字体失败: {e}")
    def get_font(self, language: str = 'zh') -> str:
        """
        获取可用的字体路径
        Args:
            language: 语言类型 ('zh' 或 'en')
        Returns:
            字体文件路径
        """
        if language in self.available_fonts:
            return self.available_fonts[language]
        logger.warning(f"未找到{language}字体，使用默认字体")
        return 'DejaVuSans' if language == 'en' else 'Arial'
    def setup_matplotlib_font(self, language: str = 'zh'):
        """
        配置 Matplotlib 使用的字体
        Args:
            language: 语言类型 ('zh' 或 'en')
        """
        try:
            font_path = self.get_font(language)
            if os.path.isfile(font_path):
                # 注册字体文件到 Matplotlib
                fm.fontManager.addfont(font_path)
                # 从文件路径加载字体
                prop = fm.FontProperties(fname=font_path)
                plt.rcParams['font.sans-serif'] = [prop.get_name()]
                # 解决负号显示问题
                plt.rcParams['axes.unicode_minus'] = False
                logger.info(f"Matplotlib 字体配置为: {font_path}")
            else:
                # 使用字体名称
                plt.rcParams['font.sans-serif'] = [font_path]
                plt.rcParams['axes.unicode_minus'] = False
                logger.info(f"Matplotlib 字体配置为: {font_path}")
            plt.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题
        except Exception as e:
            logger.error(f"配置 Matplotlib 字体失败: {e}")
    def get_font_installation_command(self) -> str:
        """
        获取当前系统推荐的字体安装命令
        Returns:
            安装命令字符串
        """
        if self.platform == 'linux':
            return "apt-get install fonts-wqy-microhei fonts-noto-cjk-extra -y"
        elif self.platform == 'darwin':
            return "brew install --cask font-noto-sans-cjk"
        else:
            return "请从 https://www.noto-fonts.cn 下载并安装 Noto Sans CJK 字体"
    def suggest_font_installation(self) -> bool:
        """
        检查并建议安装字体
        Returns:
            是否建议安装字体
        """
        if 'zh' not in self.available_fonts:
            logger.warning("=" * 60)
            logger.warning("⚠️  警告: 未找到中文字体！")
            logger.warning("推荐的安装命令:")
            logger.warning(self.get_font_installation_command())
            logger.warning("=" * 60)
            return True
        return False
    @staticmethod
    def check_font_available(font_name: str) -> bool:
        """
        检查指定字体是否可用
        Args:
            font_name: 字体名称
        Returns:
            字体是否可用
        """
        try:
            fm.findfont(fm.FontProperties(family=font_name))
            return True
        except:
            return False
 # 全局字体管理器实例
 _font_manager: Optional[FontManager] = None
 def get_font_manager(fonts_dir: Optional[Path] = None) -> FontManager:
    """获取全局字体管理器实例"""
    global _font_manager
    if _font_manager is None:
        _font_manager = FontManager(fonts_dir)
    return _font_manager
 def setup_fonts_for_app(languages: List[str] = ['zh', 'en']) -> Dict[str, str]:
    """
    为应用设置字体 (一次性初始化)
    Args:
        languages: 需要支持的语言列表
    Returns:
        字体配置字典
    """
    font_manager = get_font_manager()
    # 提示用户安装字体（如需要）
    font_manager.suggest_font_installation()
    # 为每个语言配置 Matplotlib
    fonts_config = {}
    for lang in languages:
        try:
            # 配置 Matplotlib
            font_manager.setup_matplotlib_font(lang)
            logger.info(f"✓ {lang} 语言字体配置完成")
        except Exception as e:
            logger.error(f"配置 {lang} 语言字体失败: {e}")
    return fonts_config
--- a/app/services/linux_adapter.py
+++ b/app/services/linux_adapter.py
@ -0,0 +1,210 @@
 """
 Linux 系统适配模块
 处理 Linux 特有的路径、权限、环境变量等问题
 """
 import os
 import sys
 import logging
 from pathlib import Path
 from typing import Optional
 logger = logging.getLogger(__name__)
 class LinuxAdapter:
    """Linux 系统适配器"""
    @staticmethod
    def is_linux() -> bool:
        """检查是否运行在 Linux 系统上"""
        return sys.platform.startswith('linux')
    @staticmethod
    def normalize_path(path: str) -> Path:
        """
        规范化路径 - 适配不同操作系统
        Args:
            path: 路径字符串（可能混合了不同分隔符）
        Returns:
            规范化后的 Path 对象
        """
        # 替换反斜杠为正斜杠
        path = path.replace('\\', '/')
        # 创建 Path 对象，会根据系统自动转换
        return Path(path).resolve()
    @staticmethod
    def ensure_directory_writable(dir_path: Path) -> bool:
        """
        确保目录可写
        Args:
            dir_path: 目录路径
        Returns:
            是否成功
        """
        try:
            dir_path = Path(dir_path)
            dir_path.mkdir(parents=True, exist_ok=True)
            # 检查写入权限
            test_file = dir_path / '.test_write'
            test_file.touch()
            test_file.unlink()
            logger.info(f"✓ 目录可写: {dir_path}")
            return True
        except PermissionError:
            logger.error(f"✗ 没有写入权限: {dir_path}")
            logger.error(f"  建议: sudo chmod 755 {dir_path}")
            return False
        except Exception as e:
            logger.error(f"✗ 目录检查失败: {dir_path} - {e}")
            return False
    @staticmethod
    def get_recommended_upload_dir() -> Path:
        """
        获取 Linux 上推荐的上传目录
        Returns:
            推荐的上传目录路径
        """
        # 优先级:
        # 1. 环境变量指定的目录
        # 2. 项目相对路径
        # 3. /tmp (临时目录)
        if upload_dir := os.getenv('UPLOAD_DIR'):
            return Path(upload_dir)
        project_upload = Path(__file__).parent.parent.parent / 'uploads'
        if project_upload.exists() and os.access(project_upload, os.W_OK):
            return project_upload
        logger.warning("使用系统临时目录进行上传存储")
        return Path('/tmp/lazy_fjh_uploads')
    @staticmethod
    def setup_signal_handlers():
        """
        设置 Linux 信号处理器
        确保优雅关闭
        """
        import signal
        def signal_handler(sig, frame):
            logger.info(f"收到信号 {sig}，开始优雅关闭...")
            sys.exit(0)
        if LinuxAdapter.is_linux():
            signal.signal(signal.SIGTERM, signal_handler)
            signal.signal(signal.SIGINT, signal_handler)
            logger.info("✓ Linux 信号处理器已注册")
    @staticmethod
    def get_process_info() -> dict:
        """
        获取当前进程信息
        Returns:
            进程信息字典
        """
        import psutil
        process = psutil.Process(os.getpid())
        return {
            'pid': os.getpid(),
            'user': os.getlogin() if LinuxAdapter.is_linux() else 'unknown',
            'memory_mb': process.memory_info().rss / 1024 / 1024,
            'cpu_percent': process.cpu_percent(interval=1),
            'num_threads': process.num_threads()
        }
    @staticmethod
    def check_system_resources() -> dict:
        """
        检查系统资源
        Returns:
            系统资源信息
        """
        import psutil
        return {
            'cpu_count': psutil.cpu_count(),
            'total_memory_gb': psutil.virtual_memory().total / (1024**3),
            'available_memory_gb': psutil.virtual_memory().available / (1024**3),
            'disk_usage_percent': psutil.disk_usage('/').percent
        }
    @staticmethod
    def optimize_for_linux():
        """
        针对 Linux 系统进行优化
        """
        if not LinuxAdapter.is_linux():
            return
        logger.info("应用 Linux 系统优化...")
        # 1. 增加文件描述符限制
        try:
            import resource
            soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
            resource.setrlimit(resource.RLIMIT_NOFILE, (hard, hard))
            logger.info(f"✓ 设置文件描述符限制: {hard}")
        except Exception as e:
            logger.warning(f"⚠ 无法设置文件描述符: {e}")
        # 2. 可选内存限制（默认跳过，避免在 WSL/容器中因 RLIMIT_AS 过低触发 MemoryError）
        mem_limit_env = os.getenv('LINUX_RLIMIT_AS_MB') or os.getenv('LINUX_MEMORY_LIMIT_MB')
        if mem_limit_env:
            try:
                import resource
                limit_mb = int(mem_limit_env)
                limit_bytes = limit_mb * 1024**2
                resource.setrlimit(resource.RLIMIT_AS, (limit_bytes, limit_bytes))
                logger.info(f"✓ 设置虚拟内存限制: {limit_mb}MB")
            except Exception as e:
                logger.warning(f"⚠ 无法设置虚拟内存限制: {e}")
        else:
            logger.info("跳过虚拟内存限制 (未设置 LINUX_RLIMIT_AS_MB / LINUX_MEMORY_LIMIT_MB)")
        # 3. 注册信号处理
        LinuxAdapter.setup_signal_handlers()
        logger.info("Linux 系统优化完成")
 def init_linux_environment():
    """
    初始化 Linux 环境
    在应用启动时调用
    """
    if not LinuxAdapter.is_linux():
        logger.info("非 Linux 系统，跳过 Linux 特定初始化")
        return
    logger.info("=" * 60)
    logger.info("初始化 Linux 环境...")
    logger.info("=" * 60)
    # 应用优化
    LinuxAdapter.optimize_for_linux()
    # 检查系统资源
    resources = LinuxAdapter.check_system_resources()
    logger.info(f"系统资源: {resources}")
    # 检查上传目录
    upload_dir = LinuxAdapter.get_recommended_upload_dir()
    if not LinuxAdapter.ensure_directory_writable(upload_dir):
        logger.warning(f"上传目录 {upload_dir} 可能不可写")
    logger.info("Linux 环境初始化完成")
--- a/app/services/oss_csv_source.py
+++ b/app/services/oss_csv_source.py
@ -0,0 +1,159 @@
 """OSS/URL CSV source (v2).
 - Validates incoming URL to reduce SSRF risk (allowlist + IP checks)
 - Downloads CSV to a local temporary file for analysis
 This module is intentionally small and dependency-light.
 """
 from __future__ import annotations
 import ipaddress
 import logging
 import os
 import socket
 import tempfile
 from dataclasses import dataclass
 from typing import Optional
 from urllib.parse import urlsplit
 import requests
 from app.core.config import settings
 logger = logging.getLogger(__name__)
@dataclass(frozen=True)
 class DownloadedCsv:
    local_path: str
    source_host: str
    source_name: str
    etag: Optional[str] = None
    last_modified: Optional[str] = None
 class UrlValidationError(ValueError):
    pass
 def _is_ip_allowed(ip_str: str) -> bool:
    ip = ipaddress.ip_address(ip_str)
    if settings.V2_ALLOW_PRIVATE_NETWORKS:
        return True
    # Block loopback/link-local/private/multicast/unspecified/reserved
    if (
        ip.is_loopback
        or ip.is_private
        or ip.is_link_local
        or ip.is_multicast
        or ip.is_unspecified
        or ip.is_reserved
    ):
        return False
    return True
 def validate_source_url(source_url: str) -> tuple[str, str]:
    """Validate URL and return (host, source_name)."""
    if not source_url or not isinstance(source_url, str):
        raise UrlValidationError("source_url 不能为空")
    parts = urlsplit(source_url)
    if parts.scheme not in {"https", "http"}:
        raise UrlValidationError("仅支持 http/https URL")
    if parts.scheme == "http" and not settings.V2_ALLOW_HTTP:
        raise UrlValidationError("不允许 http；请使用 https 或开启 V2_ALLOW_HTTP")
    if not parts.netloc:
        raise UrlValidationError("URL 缺少 host")
    # Disallow URLs with userinfo
    if "@" in parts.netloc:
        raise UrlValidationError("URL 不允许包含用户名/密码")
    host = parts.hostname
    if not host:
        raise UrlValidationError("无法解析 URL host")
    # Optional allowlist
    if settings.V2_ALLOWED_HOSTS:
        allowed = {h.lower() for h in settings.V2_ALLOWED_HOSTS}
        if host.lower() not in allowed:
            raise UrlValidationError(f"host 不在白名单: {host}")
    # Resolve host -> IP and block private/loopback, unless explicitly allowed.
    try:
        addr_info = socket.getaddrinfo(host, None)
    except socket.gaierror as e:
        raise UrlValidationError(f"DNS 解析失败: {host} ({e})") from e
    for family, _type, _proto, _canonname, sockaddr in addr_info:
        ip_str = None
        if family == socket.AF_INET:
            ip_str = str(sockaddr[0])
        elif family == socket.AF_INET6:
            ip_str = str(sockaddr[0])
        if ip_str and not _is_ip_allowed(ip_str):
            raise UrlValidationError(f"host 解析到不允许的 IP: {ip_str}")
    source_name = os.path.basename(parts.path) or "data.csv"
    return host, source_name
 def download_csv_to_tempfile(source_url: str, *, suffix: str = ".csv") -> DownloadedCsv:
    """Download URL content to a temp file and return local path + meta."""
    host, source_name = validate_source_url(source_url)
    # Create temp file inside configured TEMP_DIR for easier ops/observability
    settings.TEMP_DIR.mkdir(exist_ok=True)
    tmp = tempfile.NamedTemporaryFile(
        mode="wb",
        suffix=suffix,
        dir=str(settings.TEMP_DIR),
        delete=False,
    )
    try:
        timeout = (settings.V2_CONNECT_TIMEOUT_SECONDS, settings.V2_DOWNLOAD_TIMEOUT_SECONDS)
        with requests.get(source_url, stream=True, timeout=timeout) as resp:
            resp.raise_for_status()
            etag = resp.headers.get("ETag")
            last_modified = resp.headers.get("Last-Modified")
            for chunk in resp.iter_content(chunk_size=1024 * 1024):
                if not chunk:
                    continue
                tmp.write(chunk)
        tmp.flush()
        tmp.close()
        if os.path.getsize(tmp.name) <= 0:
            raise UrlValidationError("下载内容为空")
        return DownloadedCsv(
            local_path=tmp.name,
            source_host=host,
            source_name=source_name,
            etag=etag,
            last_modified=last_modified,
        )
    except Exception:
        try:
            tmp.close()
        except Exception:
            pass
        try:
            os.unlink(tmp.name)
        except Exception:
            pass
        raise
--- a/complex_test.csv
+++ b/complex_test.csv
@ -0,0 +1,201 @@
 date,sales,ad_cost,temperature
 2023-01-01,100.99342830602247,52.28565095475265,25.216717023616898
 2023-01-02,107.81824957442487,56.71304741905361,28.151623674857277
 2023-01-03,111.52881013862651,61.17966128518964,29.915228586591738
 2023-01-04,108.02879336255101,59.283406941450025,29.990188012450005
 2023-01-05,96.05238491579978,41.13784561811443,28.44879856043664
 2023-01-06,90.99563724567052,40.80869342325965,31.617293515635463
 2023-01-07,97.00157358481341,51.075963128450006,26.495631174163773
 2023-01-08,103.5773501810333,54.357604845077695,29.22110275096627
 2023-01-09,109.08744194176155,57.118959402411015,29.95887684488444
 2023-01-10,113.00806452823355,75.76768971739038,31.091055195643584
 2023-01-11,105.55591950042415,55.63241230367791,31.632332071452602
 2023-01-12,97.09626710914998,54.22596175547798,26.073309905390914
 2023-01-13,93.65306223164116,51.59653993328659,24.79464241241625
 2023-01-14,91.96068299744252,49.237297755250246,33.179764134037235
 2023-01-15,100.63489742855835,48.74110249107745,30.293424447999076
 2023-01-16,110.82697548909167,59.20833384701217,27.000771546109235
 2023-01-17,111.57901650425853,51.92538217944141,33.849435826065054
 2023-01-18,108.60807500637348,53.1199444694867,29.492752546094657
 2023-01-19,97.7225257041962,46.43444511295258,32.63336893912616
 2023-01-20,92.05768224050016,46.43821181718169,29.247781574883593
 2023-01-21,100.66865680262455,61.90762123467982,35.17721864901782
 2023-01-22,105.67497634353768,43.5011622088101,34.21074614542007
 2023-01-23,114.00518017281185,60.43389103827849,28.14757991637182
 2023-01-24,112.42920163028982,48.151021459196656,31.758933958390703
 2023-01-25,108.38465543599729,51.83266838905148,30.730097698001675
 2023-01-26,101.27527814467474,56.082392057174204,32.84961326556187
 2023-01-27,94.30141660796275,47.47210839945869,25.798696954943104
 2023-01-28,100.44473732127179,44.833644770989366,30.701370460397328
 2023-01-29,106.96636309508162,49.90666300124097,31.768238284669366
 2023-01-30,115.19859424833119,60.99728586883897,23.266495108569856
 2023-01-31,115.74165722884301,54.218995455835824,24.94268677356046
 2023-02-01,114.6690461592577,58.41681602753873,22.32451452199608
 2023-02-02,102.54550585069765,51.50061212486789,27.583739295661303
 2023-02-03,96.21708851966602,44.85054252180392,30.494335310101455
 2023-02-04,103.30012064507876,62.36978076916601,32.79852844272025
 2023-02-05,107.76615313512,57.05267167915006,28.463490371410078
 2023-02-06,118.1047284831108,48.92665130826737,33.07680141058322
 2023-02-07,114.68453024304476,58.27453669536952,24.000399142943266
 2023-02-08,109.79663046257247,51.589382907444296,22.980304943241066
 2023-02-09,104.48969129818198,56.50701232307211,27.87355790833527
 2023-02-10,101.54656654412297,46.81067957989798,29.142146095561642
 2023-02-11,103.96500111652492,51.408818350927966,27.841614248180033
 2023-02-12,112.01560574417775,58.53273926699116,21.687120936061277
 2023-02-13,118.9828573044494,63.820204623075306,27.57183586136113
 2023-02-14,117.29813462397733,52.64758527670978,23.87553622210353
 2023-02-15,112.4994883398013,54.577237990695906,29.7477111138268
 2023-02-16,104.70275042588428,49.97664865713736,28.78823694934582
 2023-02-17,103.92903467881405,48.69787117653846,24.818551595791803
 2023-02-18,106.28211128125639,61.968326842033676,26.046338946482383
 2023-02-19,110.75852018598077,57.40416864779516,24.3600478765442
 2023-02-20,122.12422779023196,54.75769412344075,27.299399894110135
 2023-02-21,121.12891675425766,65.15376811240272,30.302612891151956
 2023-02-22,114.06938782186724,67.64547489599678,24.4297565343602
 2023-02-23,108.38021603048709,59.35243431799928,28.848822963638963
 2023-02-24,105.62999092642217,45.21814563344102,25.695659305686696
 2023-02-25,109.43524939303815,52.29645433218782,24.85756240773558
 2023-02-26,114.6422754496737,63.65569347076997,26.864838568377532
 2023-02-27,122.74145559432745,57.83238046906982,24.02995142470168
 2023-02-28,124.1981979657437,64.31819612360299,25.424479219636847
 2023-03-01,118.85647916163501,63.30140984796419,23.44154220163044
 2023-03-02,107.73630868962128,49.233501986920224,32.879100021864744
 2023-03-03,104.9579248974653,52.18133566842365,27.040464022749358
 2023-03-04,107.34286166502932,37.4650941321693,24.785245586575005
 2023-03-05,115.96259583523435,52.859359710945725,27.47611058647402
 2023-03-06,126.86147718486289,62.167897835465645,26.44693544891746
 2023-03-07,127.87752639288169,57.699847286616595,26.070759543108874
 2023-03-08,118.24185044528706,67.2829817423017,28.525917185557415
 2023-03-09,112.24465279387695,48.971619507135316,28.905688959287644
 2023-03-10,107.82181320681322,51.71068416992169,24.991411130032738
 2023-03-11,110.2529822211921,55.78019399702651,24.805208594648875
 2023-03-12,121.11006570384127,67.76139929725122,25.657256968846575
 2023-03-13,130.1816737833995,57.91152613580255,19.526397309813348
 2023-03-14,126.71565911008061,69.1736483158151,21.836336361143037
 2023-03-15,122.99418850292842,61.548259556562144,30.432281093790863
 2023-03-16,106.54633650145081,48.365624995485646,31.21631017567973
 2023-03-17,110.51968335255634,57.57035904759452,25.484047660225336
 2023-03-18,113.70966939484823,57.85013317529147,27.910575411780364
 2023-03-19,121.81927189435518,57.90855156138362,27.06440372996227
 2023-03-20,129.15083837967413,64.92442961478716,35.317044435415966
 2023-03-21,124.42743772967184,60.287150880527115,29.388875488072575
 2023-03-22,120.90336183411937,61.01926764331592,25.596146723045138
 2023-03-23,114.05377092727564,60.33753883624305,23.063026919404756
 2023-03-24,113.61702757529353,64.73859786837353,21.060058024151907
 2023-03-25,114.49586375810298,51.05885438491725,26.439536636244885
 2023-03-26,122.827839869664,72.07908680811333,23.5098422365089
 2023-03-27,129.81797713472835,55.148549569751665,21.461882087287382
 2023-03-28,131.84175921611333,65.16195413287875,23.738673307071416
 2023-03-29,123.47701234402419,64.68009220443497,22.383496692674402
 2023-03-30,113.83938888817369,58.324653782762006,30.639314352453876
 2023-03-31,113.48113806388008,53.62707143283707,28.172557461803127
 2023-04-01,117.72767203318188,57.823224764804564,25.453469010723516
 2023-04-02,128.40696958856444,61.73848012098806,29.866968095062035
 2023-04-03,131.26394087272118,62.685146651639535,25.60898934505341
 2023-04-04,130.95724623585764,69.72663360303395,22.742780561844356
 2023-04-05,123.51132541540858,63.540740137529525,29.845754141356707
 2023-04-06,113.5370478780282,53.30397596271083,26.842860784320308
 2023-04-07,114.84818220676686,61.922090480549684,22.064140934005557
 2023-04-08,120.06082896926581,61.56691208901595,24.554612106452694
 2023-04-09,128.50185694254756,68.31523906546857,22.44852212426784
 2023-04-10,134.03773943222305,70.16701392572959,20.876726435247697
 2023-04-11,130.37680723958826,61.04342856518377,27.753407014454222
 2023-04-12,124.92973736296452,59.66396348049741,30.652873036988282
 2023-04-13,117.34977804457614,62.41135704790438,20.67866913783906
 2023-04-14,114.46066606640413,60.28218436036939,26.51302831308679
 2023-04-15,121.2252388983924,60.508111479375465,22.821941639368188
 2023-04-16,131.31856822823775,66.24592103066279,23.262241939158173
 2023-04-17,140.11039848581893,76.44352372185159,22.89618506145425
 2023-04-18,135.1451770527603,64.61473158220099,22.03114326885
 2023-04-19,127.76129781335801,66.6161358125292,24.718429205442522
 2023-04-20,119.46355643368209,58.720814954671575,22.029762716093522
 2023-04-21,114.0448569861147,55.93402247692124,25.28373228638474
 2023-04-22,123.50756339829785,67.24766595908487,24.271396224416407
 2023-04-23,132.64644016382874,70.45030182685451,23.655015155883184
 2023-04-24,143.0878120259834,75.61145419299488,21.598917054076214
 2023-04-25,135.9934038087475,74.5240959401454,22.5410427922146
 2023-04-26,129.32437177871165,64.76720509751962,26.48727920511546
 2023-04-27,121.12652393740836,63.973026825179005,25.673605834229928
 2023-04-28,117.37007501648283,57.13370372527413,21.187937280679726
 2023-04-29,127.86250209122531,65.55208280805486,24.368348675081645
 2023-04-30,136.04182913809416,67.37019929720866,26.27426187262793
 2023-05-01,141.55882902128073,71.26439433560395,18.96163340286704
 2023-05-02,136.13522897297034,71.04339961366973,25.549678567089554
 2023-05-03,133.0020851904919,62.409939179078584,21.881475456830803
 2023-05-04,119.98214368927341,70.453008223064,25.530891483166414
 2023-05-05,122.71397649985045,56.326901342426716,21.479066751477976
 2023-05-06,131.97731026240027,59.917712067261476,18.303946662830565
 2023-05-07,134.56514004748436,73.07312439124252,18.785714394893226
 2023-05-08,140.65169760802368,74.28416227382652,23.762345292245453
 2023-05-09,139.72310582696744,72.9821519987445,24.347006701144345
 2023-05-10,130.66513240891535,68.47429375077908,20.80463806438527
 2023-05-11,121.28095236867787,60.57924232010436,25.38311405974921
 2023-05-12,123.51795943208879,57.272707858615234,18.4325252403288
 2023-05-13,127.494401479568,64.12622353075263,23.16859477491232
 2023-05-14,139.49771268252908,66.36304778370399,19.683534315285495
 2023-05-15,141.74502392454568,75.74811062936159,21.31082333488498
 2023-05-16,144.1875443211837,71.35848525308116,23.358276415959292
 2023-05-17,131.58175992050872,61.6633939762918,20.584589049876787
 2023-05-18,125.34125538337241,61.06369848342124,21.961911256757762
 2023-05-19,126.85611321109774,65.49271387692698,26.084205060809154
 2023-05-20,129.18274516107348,61.77274981651687,21.284399768314977
 2023-05-21,141.00563068175677,66.39171336304622,25.47190045679844
 2023-05-22,147.98975578319556,75.21331394905734,19.525452300348753
 2023-05-23,139.4308115573891,70.94023863423816,24.453734141786047
 2023-05-24,134.99454013588115,64.96255419108493,27.138776213732495
 2023-05-25,128.11503428381528,61.70232561381602,15.34888559509552
 2023-05-26,128.6485719341678,65.48453565387207,20.32288207278455
 2023-05-27,131.19867759470725,58.358917089867006,24.394532964456197
 2023-05-28,139.90565414086922,62.915508198551834,22.003929168504186
 2023-05-29,148.20294343801243,70.50925061274404,23.676251690465687
 2023-05-30,144.79224153467328,71.3288850087774,20.700607253922893
 2023-05-31,136.6043139822718,69.85669481912592,22.72208092020764
 2023-06-01,129.90496634749854,72.32926425849703,21.945028595331298
 2023-06-02,127.58824815952354,68.08242219577187,25.865155230205552
 2023-06-03,136.16761518850672,67.28411494443623,23.074820318848364
 2023-06-04,145.1240516092851,72.4669447651291,23.274114518888926
 2023-06-05,147.5059196939697,68.7403130237958,20.97542437801451
 2023-06-06,149.47687307825956,74.64587085916783,20.697985347883023
 2023-06-07,138.53032591367202,67.82186976223531,20.812878200360235
 2023-06-08,128.45328941709883,65.84023751023986,23.243657934672576
 2023-06-09,132.13221605087253,61.92995330767465,20.747096808795494
 2023-06-10,135.78647805542306,70.48997159891739,22.829123565664112
 2023-06-11,148.0987107210833,81.71304992555454,28.135750134629784
 2023-06-12,153.0193346048766,75.96586656015401,24.47267059270714
 2023-06-13,145.6457394335731,74.83142832728126,20.83097462962713
 2023-06-14,140.9902449994371,73.94584245827411,25.36243573634108
 2023-06-15,133.2924186394212,64.64010696028141,20.484316594503184
 2023-06-16,134.34139003578426,68.29115742694421,15.54391785175287
 2023-06-17,143.56414528086364,71.8450346443408,18.583781268252814
 2023-06-18,148.01551228867737,74.49613663708284,15.945413181646051
 2023-06-19,150.95414370388056,71.61202293266295,20.452997236318286
 2023-06-20,147.0447580558963,73.64492989924287,21.51254156972946
 2023-06-21,138.91442946215307,71.94720618730378,26.436347112705246
 2023-06-22,133.9508505306588,74.23114330430461,22.337566040890476
 2023-06-23,135.26498811523834,72.42884818804521,20.64923107688999
 2023-06-24,142.36042104862796,81.94612281187176,23.744498150585642
 2023-06-25,152.13733471736649,72.231929544243,14.572624223730113
 2023-06-26,154.23904352473193,81.48112494596936,21.86262256879806
 2023-06-27,153.26261953185252,77.54801979461801,23.418123219851857
 2023-06-28,141.501240633759,81.69963498296786,16.619517644570024
 2023-06-29,141.19092331014159,66.55397022829503,24.436287255248928
 2023-06-30,137.72658495620402,64.66468326719813,21.970263091829977
 2023-07-01,142.13074316454697,68.06840835505338,19.65865887136292
 2023-07-02,150.31262022756084,64.53683149223139,22.752616955102773
 2023-07-03,156.92136533513548,75.83190755916394,27.6160986739157
 2023-07-04,151.43565463277307,71.92216400861804,21.29936760939659
 2023-07-05,144.94522732135877,73.22458259306042,21.448179346840707
 2023-07-06,138.3500162874156,70.88378802259359,19.27518363303756
 2023-07-07,138.2292024966561,78.49545544440747,18.05348196698251
 2023-07-08,144.19080360135797,76.84752099160923,23.04377126872821
 2023-07-09,151.39073391880174,72.81084868108886,17.934261085087467
 2023-07-10,156.7987408262441,73.90729705638027,20.66696001819084
 2023-07-11,155.11785753935226,80.01852462720866,18.969037709955906
 2023-07-12,145.4344726620079,66.11607029590074,21.788698271209025
 2023-07-13,136.57252976954146,77.4435587140425,21.30249385354929
 2023-07-14,140.6277628521662,76.21108202968954,23.363876114180734
 2023-07-15,148.69544667183354,72.00184507539325,18.670955828561386
 2023-07-16,154.61315597254045,68.74090534081584,19.34112896296411
 2023-07-17,159.72655950094892,86.63264162130153,17.16421136521589
 2023-07-18,155.0395982759213,76.94709991169756,18.71737147605307
 2023-07-19,144.212007347891,78.2950852338128,21.131901479134555
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,38 @@
 version: '3.8'
 services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: lazy-fjh-app
    ports:
      - "60201:60201"
    volumes:
      - ./uploads:/app/uploads
      - ./logs:/app/logs
      - ./temp:/app/temp
    env_file:
      - .env
    environment:
      - ENV=production
      - DEBUG=False
      - HOST=0.0.0.0
      - PORT=60201
      - LOG_LEVEL=info
      - LANGUAGE_DEFAULT=zh
      - ANALYSIS_TIMEOUT=300
      - MAX_MEMORY_MB=500
    restart: always
    networks:
      - app-network
    healthcheck:
      test: [ "CMD", "curl", "-f", "http://localhost:60201/health" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
 networks:
  app-network:
    driver: bridge
--- a/docs/V1.1.md
+++ b/docs/V1.1.md
@ -0,0 +1,26 @@
 # 会话总结（V1.1）
 ## 近期问题与修复
 - **内存限制**：发现 uvicorn 启动阶段因 RLIMIT_AS 2GB 触发 `MemoryError`。将内存限制改为可选，通过环境变量控制（默认不设限）。
 - **pandas 频率大小写**：pandas 3.x 需小写频率，已将 `freq='H'/'S'` 改为 `freq='h'/'s'`，解决 `Invalid frequency` 报错。
 ## 接口现状（简）
 - v1 `/api`: `upload`、`analyze`、`available_methods`、`image/{filename}`、`download/{filename}`、`list_uploads` 均已实现。
 - v2 `/api/v2`: `analyze`（OSS/URL 输入）、`available_methods` 已实现；`API_MODE=v2` 可禁用 v1 上传/图片接口。
 ## 设计决策：前端渲染、数据模式
 - 不再传 PNG，后端只返回结构化数据，前端用 ECharts 渲染。方案详见 `docs/charts-data-mode-plan.md`。
 - 统一清洗：`to_echarts_safe` 处理 NaN/Inf/pd.NA、Timestamp → ISO8601、numpy/Decimal 转原生，防循环引用。
 - 数据格式约定：
  - 时间序列/多系列：优先 `dataset`；矩阵（相关性等）提前 flatten `[i,j,value]`。
  - 直方图：后端 `np.histogram` 分箱，返回 `[range_start, range_end, count]`。
  - 样式解耦：后端不返回颜色/线型。
 - 算法保持不变，改动仅在结果封装/清洗；如需 CI/异常标注属于额外封装，不改核心算法。
 ## 待办（尚未落地）
 - 落实 charts 数据模式：实现 `to_echarts_safe`、新增 `charts` 字段、禁用图片保存时的返回路径。
 - 直方图分箱数据、异常点标注（若需要）、预测上下界（若需要）在封装层返回。
 ## 参考
 - 详细方案：`docs/charts-data-mode-plan.md`
 - 接口清单：`docs/api-endpoints-status.md`
--- a/docs/api-endpoints-status.md
+++ b/docs/api-endpoints-status.md
@ -0,0 +1,21 @@
 # API 接口清单与完成度
 ## v1 路由（前缀 `/api`）
 | 接口 | 方法 | 说明 | 状态 | 备注 |
 | --- | --- | --- | --- | --- |
 | `/api/upload` | POST | 上传 CSV 文件 | ✅ 已实现 | 仅接受 `settings.ALLOWED_EXTENSIONS`（默认 csv）。返回保存后的文件名。 |
 | `/api/analyze` | POST | 对上传的 CSV 做完整分析 | ✅ 已实现 | 返回分析结果；已切换 charts 数据模式（`analysis.<lang>.charts`），`images` 保留为空以兼容旧前端。 |
 | `/api/available_methods` | GET | 列出支持的分析方法 | ✅ 已实现 | 静态列表。 |
 | `/api/image/{filename}` | GET | 获取图片文件 | ✅ 已实现 | 从 `uploads/` 读取。 |
 | `/api/download/{filename}` | GET | 下载文件 | ✅ 已实现 | 从 `uploads/` 读取。 |
 | `/api/list_uploads` | GET | 列出上传文件 | ✅ 已实现 | 返回文件名/大小/修改时间。 |
 ## v2 路由（前缀 `/api/v2`）
 | 接口 | 方法 | 说明 | 状态 | 备注 |
 | --- | --- | --- | --- | --- |
 | `/api/v2/analyze` | POST | 从 OSS/URL 下载 CSV 并分析 | ✅ 已实现 | 复用 v1 分析器；已返回 charts 数据模式，`images` 为空。`API_MODE=v2` 下仍禁用图片。 |
 | `/api/v2/available_methods` | GET | 列出支持的分析方法 | ✅ 已实现 | 与 v1 相同。 |
 ## 已知差距 / 待办（尚未实现）
 - **预测置信区间**：当前 VAR 仅返回点预测；如需 CI 需改用 `forecast_interval`（不改算法，只取上下界）。
 - **异常点标注**：暂无标注输出；若需要，需要在封装层额外计算（max/min 或异常检测）。
--- a/docs/charts-data-mode-plan.md
+++ b/docs/charts-data-mode-plan.md
@ -0,0 +1,71 @@
 # charts 数据模式（现行版）
 > 旧版文档已存档为 `docs/旧的charts-data-mode-plan.md`。
 ## 目标
 - 后端返回结构化图表数据，前端用 ECharts 渲染；不再生成/传输图片。
 - 统一清洗，避免 NaN/Inf/不可序列化对象导致接口崩溃。
 - 响应结构以 `analysis.<lang>.charts` 为准；`images` 为空仅用于兼容旧前端。
 ## 序列化规范（to_echarts_safe 已实现）
 - NaN/Inf/pd.NA → null；numpy 标量转原生类型；Decimal 转 float。
 - Timestamp/datetime → ISO8601 字符串。
 - ndarray/DataFrame/Series → list 或 records；递归清洗并防循环引用。
 ## 响应骨架（实际线上形态）
 ```json
 {
  "success": true,
  "meta": { ... },
  "analysis": {
    "zh": {
      "data_description": "...",
      "preprocessing_steps": [...],
      "api_analysis": { ... },
      "steps": [
        { "key": "ts_img", "title": "Time Series Analysis", "chart": "ts", "summary": "..." },
        ...
      ],
      "charts": {
        "ts": { ... },
        "acf_pacf": { ... },
        ...
      }
    }
  },
  "images": {},
  "log": [...]
 }
 ```
 - 顶层不再单独返回 `charts`；前端应读取 `analysis.<lang>.charts`。如需顶层别名，可在路由层追加映射。
 - `steps[].chart` 指向 `charts` 中的 key，驱动前端展示顺序。
 ## 图表数据格式（按现实现状）
 - **时间序列 ts**：`type: line`，`dataset = [[col...], [...]]`，含 timestamp 字符串。
 - **ACF/PACF acf_pacf**：`series: [{name: col, acf:[{lag,value}], pacf:[{lag,value}]}]`，每个列打包在同一项。
 - **平稳性 stationarity**：`records: [{column, ADF:{...}, KPSS:{...}}]`。
 - **正态性 normality**：`records: [{column, histogram:[{range_start,range_end,count},...], Shapiro-Wilk:{...}, Jarque-Bera:{...}}]`。
 - **季节分解 seasonal**：`type: line`，`dataset` 包含 observed/trend/seasonal/resid，缺失为 null。
 - **频谱 spectral（已做摘要以控体积）**：
  - `spectrogram`: `f`, `t`, `Sxx_log10_mean`, `Sxx_shape`；不返回完整矩阵。
  - `periodogram`: `f` 与 `Pxx_den` 仅前 20 个点。
 - **相关性 heatmap**：`type: heatmap`，`data` 为 `[i,j,value]` 扁平列表，含 xLabels/yLabels。
 - **PCA 碎石 pca_scree**：`type: bar`，`dataset` 组件/解释度/累积值。
 - **PCA 散点 pca_scatter**：`type: scatter`，`records` 含 PC1/PC2/timestamp。
 - **特征重要性 feature_importance**：`type: bar`，`records` 含 feature/importance。
 - **聚类 cluster**：`type: scatter`，`records` 含 cluster 与 timestamp。
 - **因子分析 factor**：`type: scatter`，`records` 含 Factor1/Factor2 与 timestamp。
 - **协整 cointegration**：`type: table`，`meta` 直接承载 trace_stat/crit_vals/eigen_vals。
 - **VAR 预测 var_forecast**：`type: line`，`dataset` 含 step 与各 forecast 列。
 ## 兼容与注意
 - `images` 为空对象；任何遗留的 `image_path` 已剔除。
 - 当前频谱输出为“摘要版”，若要还原全量矩阵需调整 `perform_spectral_analysis`。
 - ACF/PACF 结构与旧文档不同，前端需按现状解码；若要拆分 series，可在后端调整 `_build_chart_payload`。
 - 正态性直方图已由后端分箱，无需前端再分箱。
 ## 已知可选改进
 1) 路由层增加顶层 `charts` 别名，便于前端迁移。
 2) ACF/PACF 改为每列拆两条 series（acf/pacf），与旧示例一致。
 3) 为 spectral 增加 `mode=full|summary` 开关，前端可选取全量或摘要。
 4) 对大体积 dataset 增加可选抽样/截断策略。
--- a/docs/关于charts模式的实现.md
+++ b/docs/关于charts模式的实现.md
@ -0,0 +1,32 @@
 # charts 模式实现说明（现行版）
 > 旧版实现在 `docs/旧的关于charts模式的实现.md`。
 ## 现状概览
 - 后端强制 `generate_plots=False`，所有步骤只产出数据，`analysis.<lang>.charts` 收口。
 - `images` 为空对象，保留兼容；`steps[].chart` 绑定对应图表 key。
 - 清洗函数 `to_echarts_safe` 递归处理 NaN/Inf/Timestamp/numpy/Decimal，确保 JSON-safe。
 ## 关键结构
 - 响应：`analysis.<lang>.charts`（顶层未暴露 charts）。
 - 时间序列、季节分解、VAR 等使用 dataset；相关性使用扁平 heatmap；PCA/聚类/因子用 records。
 - ACF/PACF：每个列含 `acf`/`pacf` 两个序列（与旧文档拆成两条 series 不同）。
 - 正态性：每列包含 histogram 分箱（后端 `np.histogram`），加 Shapiro/JB 结果。
 - 频谱：当前为摘要版（spectrogram 只给均值+shape，periodogram 仅前 20 点）。
 ## 文件与代码映射
 - 清洗与汇总：`app/services/analysis_system.py`（`to_echarts_safe`、`_build_chart_payload`、`run_analysis`）。
 - 时序数据：`app/services/analysis/modules/time_series.py`（数据-only，频谱摘要版）。
 - 正态性分箱：`app/services/analysis/modules/basic.py`。
 - 路由返回：`app/api/routes/analysis.py`、`app/api/routes/analysis_v2.py`（`charts` 位于 `analysis.<lang>`）。
 ## 与旧版差异
 - 不再生成图片；顶层不提供 `charts` 字段。
 - ACF/PACF 结构改变；频谱从全量矩阵切换为摘要版。
 - 正态性直方图格式为字典字段而非二维数组。
 ## 后续可选改进
 1) 路由层增加顶层 `charts` 别名，便于前端无感迁移。
 2) ACF/PACF 输出可改为拆分 series（与旧版示例一致）。
 3) 频谱提供 `full/summary` 开关，允许返回完整矩阵或摘要。
 4) 为大数据集增加抽样/截断策略，防止超大 payload。
--- a/docs/旧的charts-data-mode-plan.md
+++ b/docs/旧的charts-data-mode-plan.md
@ -0,0 +1,125 @@
 # 前后端分离的可视化数据返回方案（ECharts）
 ## 目标
 - 后端不再生成/传输图片，仅返回图表数据；前端使用 ECharts 渲染。
 - 统一的数据结构，减少前端适配代码；杜绝 NaN/Infinity/不可序列化对象导致的 API 崩溃。
 - 与现有 API 保持兼容（`images` 可留空），逐步迁移到 `charts` 数据模式。
 ## 序列化与清洗规范（必须遵守）
 - **NaN / Infinity / pd.NA**：递归清洗为 `null`（JSON `null`）。不得返回字符串 "NaN"。
 - **时间戳**：统一 ISO8601 字符串（例 `2023-01-01T12:00:00`）。
 - **数组/矩阵**：全部转为原生 Python list，再 JSON 序列化。
 - **DataFrame**：优先 `to_dict(orient="records")` 或组装 dataset 形式（见下）。
 - **数值类型**：numpy 标量转原生 `int/float/bool`；遇到 `nan/inf` 先清洗。
 > 建议实现一个通用函数 `to_echarts_safe(obj)`，递归处理上述清洗与类型转换，所有响应数据出站前统一走这一层。
 ## 响应骨架（新增 `charts`，旧字段保持）
 ```json
 {
  "success": true,
  "meta": { ... },
  "analysis": {
    "zh": {
      "data_description": "...",
      "preprocessing_steps": [ ... ],
      "api_analysis": { ... },
      "steps": [
        {"key": "ts", "title": "Time Series", "summary": "...", "chart": "ts"},
        ...
      ]
    }
  },
  "charts": {
    "ts": { "type": "line", "dataset": [...], "meta": {...} },
    "acf_pacf": { "type": "bar", "series": [...], "meta": {...} },
    "heatmap": { "type": "heatmap", "data": [...], "xLabels": [...], "yLabels": [...], "meta": {...} },
    ...
  },
  "images": {},
  "log": [...]
 }
 ```
 - `steps[].chart` 指向 `charts` 的 key，前端可按步骤顺序渲染。
 - `images` 保留但为空，兼容旧前端。
 ## 各图表建议的数据格式（贴合 ECharts）
 - **时间序列（ts）**：`dataset` 形式
  - 二维数组：首行表头，例如 `[ ["timestamp","sales","ad_cost"], ["2023-01-01T00:00:00", 10, 5], ... ]`
  - 前端：`dataset.source = dataset`，`series: [{type:'line', encode:{x:'timestamp', y:'sales'}}, ...]`
 - **ACF / PACF（acf_pacf）**：
  - `{ series: [{name:'acf', data:[{lag:0, value:1.0}, ...]}, {name:'pacf', data:[...] }], meta:{column:'sales'} }`
 - **平稳性检验（stationarity）**：
  - `{ adf: {statistic:..., p_value:..., critical_values:{...}}, kpss:{...}, meta:{column:'sales'} }`
  - 前端可渲染 bar/表格。
 - **正态性检验（normality）**：
  - `{ columns: [{name:'col', shapiro_p:..., jb_p:..., shapiro_stat:..., jb_stat:...}], meta:{} }`
 - **季节性分解（seasonal）**：
  - `dataset` 形式：`[["timestamp","observed","trend","seasonal","resid"], [...]]`
 - **频谱分析（spectral）**：
  - `periodogram`: `{ f: [...], psd: [...] }`（可截断前 N 点）
  - `spectrogram`: `{ f: [...], t: [...], values: [[i,j,val], ...] }`（可只返回 log10 后再压缩）
 - **相关性热力图（heatmap）**：
  - `{ data: [[i,j,value], ...], xLabels:[...], yLabels:[...], meta:{} }`（后端提前 flatten N×N 矩阵）
 - **PCA 碎石图（pca_scree）**：
  - `dataset`: `[["component","explained","cumulative"], [1,0.4,0.4], ...]`
 - **PCA 散点（pca_scatter）**：
  - `records`: `[{pc1:..., pc2:..., timestamp:"..."}, ...]`
 - **特征重要性（feature_importance）**：
  - `records`: `[{feature:"...", importance:0.12}, ...]`
 - **聚类（cluster）**：
  - `records`: `[{timestamp:"...", cluster:0, x:<可选>, y:<可选>}...]`
 - **因子分析（factor）**：
  - 类似聚类：`[{timestamp:"...", factor1:..., factor2:...}]`
 - **协整检验（cointegration）**：
  - `{ trace_stat:[...], crit_95:[...], eigen_vals:[...], meta:{} }`
 - **VAR 预测（var_forecast）**：
  - `dataset`: `[["step","var1_forecast","var2_forecast"], [1, ...], ...]`
 > 原则：能用 `dataset` 就用 `dataset`，多条线在前端通过 `encode` 指定；需要矩阵的提前 flatten；其余用 records。
 ## 样式与主题
 - 后端不返回颜色、线型等视觉样式；仅返回语义字段（series 名称、指标含义）。前端根据主题决定配色与风格。
 ## 实施步骤（建议）
 1) 增加 `to_echarts_safe` 清洗函数，统一处理 NaN/Infinity/Timestamp/DataFrame -> JSON-safe。
 2) 在各分析函数里：保留计算逻辑，改为组装 chart data（dataset/records/flatten），不再生成 PNG；`generate_plots` 逻辑可留作开关，但默认 False。
 3) `run_analysis` 汇总时，将各 step 的数据填入 `charts`，在 `steps` 内写入 `chart` key（引用图表）。
 4) 路由层返回 `charts` 字段，`images` 留空，`steps` 仍返回。
 5) 前端按 `charts` 协议接入 ECharts，去掉对 `images` 的依赖。
 ## 兼容与回退
 - 旧前端：仍可拿到 `analysis.steps` 及 `images`（为空）。
 - 新前端：使用 `charts`。如果某一步失败，返回 `{error:"..."}` 和简短 summary，避免 500。
 ## 性能注意
 - 后端不画图，CPU/IO 显著下降；如需进一步优化，可让前端传 `methods` 列表决定执行哪些步骤。
 ## 算法是否需要改动？
 - 核心统计/时序算法（ADF/KPSS、ACF/PACF、PCA、VAR、季节分解、相关矩阵、聚类等）保持不变，改动集中在“结果封装”层。
 - 需要调整的只是输出包装：
  - 将现有用于绘图的中间结果（DataFrame/ndarray/statsmodels 结果）转换为 ECharts 友好的 JSON 结构，统一经过 `to_echarts_safe` 清洗（NaN/Inf/Timestamp）。
  - 矩阵类结果（如相关性）在后端提前 flatten 成 `[i,j,value]` 列表；dataset 形式优先用于多系列折线/柱状。
  - 可按需做截断/摘要以控体积（如 periodogram 取前 N 点，spectrogram 取均值或下采样）。
  - 补充元信息（列名/单位/变量名），方便前端生成 legend/tooltip。
 - 不需改动的部分：
  - 预处理、标准化流程、算法的数学实现与参数选择（滞后阶、分解周期、PCA 组件数等）保持现状。
 - 如后续发现数据量过大或性能瓶颈，可再对个别步骤做抽样/截断，但不影响算法正确性。
 ## 追加约定（仍然不改算法，只改结果包装）
 - **直方图分箱**：正态性/分布分析中，后端负责 binning（`np.histogram`），返回 `[["range_start","range_end","count"], ...]`。前端不做分箱。
 - **to_echarts_safe 扩展**：除 NaN/Inf/Timestamp 外，显式处理 numpy 各数值类型、Decimal，必要时加“已访问集合”防循环引用。统一输出 JSON-safe、ECharts-friendly 结构。
 - **矩阵/多系列格式**：矩阵类（相关性等）继续 flatten `[i,j,value]`；多系列/多列数据优先用 dataset+encode，保证对齐。
--- a/docs/旧的关于charts模式的实现.md
+++ b/docs/旧的关于charts模式的实现.md
@ -0,0 +1,45 @@
 # 关于 charts 模式的实现
 ## 目标与范围
 - 按 `docs/charts-data-mode-plan.md` 将后端改为返回结构化图表数据（ECharts 友好），不再生成/返回图片。
 - 保持算法与分析流程不变，仅调整封装与响应结构；旧前端通过空的 `images` 字段保持兼容。
 ## 核心实现
 - **统一清洗函数**：在 `app/services/analysis_system.py` 增加 `to_echarts_safe`，递归处理 NaN/Inf/pd.NA、numpy 标量/数组、Timestamp/datetime、Decimal，带循环引用保护，输出 JSON-safe 结构。
 - **分析流程改造**：
  - 在 `run_analysis` 内强制 `generate_plots=False`，改用 `charts` 收集每步结果，`steps[].chart` 指向对应 key。
  - 为每个步骤新增 `chart_key`，映射到 `charts`：
    - `stats`（统计概览 dataset 表格）、`ts`（时间序列 dataset）、`acf_pacf`（acf/pacf 序列）、`stationarity`、`normality`（表格）、`seasonal`、`spectral`、`heatmap`（相关矩阵 flatten）、`pca_scree`、`pca_scatter`、`feature_importance`、`cluster`、`factor`（records）、`cointegration`（表格 meta）、`var_forecast`（forecast dataset，含 step 列）。
  - `_build_chart_payload` 依据 chart_key 组装 ECharts 友好的 dataset/records/flatten 结构，并通过 `to_echarts_safe` 清洗。
  - 移除 fallback 图片生成，仅保留文字 fallback 分析。
 - **数据层改动**：
  - 正态性检验在 `app/services/analysis/modules/basic.py` 内增加直方图分箱：`np.histogram` 返回 `[range_start, range_end, count]` 列表，便于前端直接渲染。
 ## 路由响应调整
 - v1 `POST /api/analyze` 与 v2 `POST /api/v2/analyze`：
  - `analysis.<lang>.charts` 返回各图表数据；`steps` 保留顺序与摘要，并携带 `chart` 引用。
  - `images` 始终为空对象，仅为兼容旧前端；删除旧的图片复制/保存逻辑，并剔除 `image_path` 泄露。
 ## 兼容性与注意事项
 - 核心算法、预处理、API 分析调用保持原样；仅输出封装变化。
 - 如果前端仍使用旧版，需要改为读取 `analysis.<lang>.charts` 与 `steps[].chart`。旧字段（images）为空不会报错。
 - 大型数据仍需关注内存占用；如需进一步压缩，可在 `_build_chart_payload` 中添加截断/抽样。
 ## 相关文件
 - 实现细节：`app/services/analysis_system.py`
 - 直方图分箱：`app/services/analysis/modules/basic.py`
 - 路由返回：`app/api/routes/analysis.py`、`app/api/routes/analysis_v2.py`
 - 设计说明：`docs/charts-data-mode-plan.md`
 ## 2026-01-29 补充
 - ACF/PACF 输出改为按 `lag/value` 的 records，便于前端直接做 bar/line 映射。
 - 频谱输出：
  - `spectrogram` 增加降采样并返回 `values: [i,j,val]`，附 `f`、`t` 列表。
  - `periodogram` 返回 dataset 形式 `["f","psd"]`（截断前 200 点）。
 - `docs/api-endpoints-status.md` 已更新状态，标记 charts 模式落地，`images` 为空仅兼容。 
 ## 2026-01-29 后续调整
 - 应需求取消频谱降采样：`spectrogram` 现返回全量 `f/t` 与全部 `values[i,j,val]`，`periodogram` 返回全量频点 dataset（可能显著增大 payload；如需再控体积，可重新引入上限或抽样）。
 ## 2026-01-29 再次更新
 - time_series 模块回归“只返数据不生成图片”：时间序列、ACF/PACF、季节分解、频谱均不再绘图，直接返回 charts 所需数据；频谱依旧不降采样，返回全量值。
--- a/generate_openapi.py
+++ b/generate_openapi.py
@ -0,0 +1,28 @@
 import os
 import sys
 import json
 from pathlib import Path
 # Add project root to path to ensure imports work
 sys.path.append(os.path.dirname(os.path.abspath(__file__)))
 try:
    from app.main import app
    print("Successfully imported FastAPI app.")
 except ImportError as e:
    print(f"Error importing app: {e}")
    sys.exit(1)
 def generate_openapi_json():
    openapi_schema = app.openapi()
    output_path = Path("openapi.json")
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(openapi_schema, f, indent=2, ensure_ascii=False)
    print(f"OpenAPI documentation generated at: {output_path.absolute()}")
 if __name__ == "__main__":
    generate_openapi_json()
--- a/generate_test_data.py
+++ b/generate_test_data.py
@ -0,0 +1,29 @@
 import pandas as pd
 import numpy as np
 # 设置随机种子
 np.random.seed(42)
 # 生成 200 天的时间序列
 dates = pd.date_range(start='2023-01-01', periods=200, freq='D')
 # 构造数据
 trend = np.linspace(0, 50, 200)
 seasonality = 10 * np.sin(np.linspace(0, 3.14 * 2 * (200/7), 200))
 noise = np.random.normal(0, 2, 200)
 sales = 100 + trend + seasonality + noise
 ad_cost = sales * 0.5 + np.random.normal(0, 5, 200)
 temperature = 30 - trend * 0.2 + np.random.normal(0, 3, 200)
 # 创建 DataFrame
 df = pd.DataFrame({
    'date': dates,
    'sales': sales,
    'ad_cost': ad_cost,
    'temperature': temperature
 })
 # 保存
 df.to_csv('complex_test.csv', index=False)
 print("✅ 成功生成测试文件: complex_test.csv")
--- a/openapi.json
+++ b/openapi.json
@ -0,0 +1,552 @@
 {
  "openapi": "3.1.0",
  "info": {
    "title": "时间序列数据分析系统",
    "description": "支持多格式数据上传、AI增强分析、多语言报告生成",
    "version": "2.0.0"
  },
  "paths": {
    "/api/upload": {
      "post": {
        "tags": [
          "upload"
        ],
        "summary": "上传CSV或图片文件",
        "description": "上传数据文件（CSV 或图片）\n\n- **file**: CSV 或图片文件 (PNG, JPG, BMP, TIFF)\n- **task_description**: 分析任务描述",
        "operationId": "upload_file_api_upload_post",
        "requestBody": {
          "content": {
            "multipart/form-data": {
              "schema": {
                "$ref": "#/components/schemas/Body_upload_file_api_upload_post"
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/UploadResponse"
                }
              }
            }
          },
          "422": {
            "description": "Validation Error",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/HTTPValidationError"
                }
              }
            }
          }
        }
      }
    },
    "/api/available_methods": {
      "get": {
        "tags": [
          "analysis"
        ],
        "summary": "获取可用的分析方法",
        "description": "获取所有可用的分析方法",
        "operationId": "get_available_methods_api_available_methods_get",
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {
                  "additionalProperties": true,
                  "type": "object",
                  "title": "Response Get Available Methods Api Available Methods Get"
                }
              }
            }
          }
        }
      }
    },
    "/api/analyze": {
      "post": {
        "tags": [
          "analysis"
        ],
        "summary": "执行完整分析",
        "description": "执行完整的时间序列分析\n\n流程:\n1. 加载并预处理数据\n2. 执行15种分析方法\n3. 调用AI API 进行深度分析\n4. 生成PDF/PPT/HTML报告",
        "operationId": "analyze_data_api_analyze_post",
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/AnalysisRequest"
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {
                  "additionalProperties": true,
                  "type": "object",
                  "title": "Response Analyze Data Api Analyze Post"
                }
              }
            }
          },
          "422": {
            "description": "Validation Error",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/HTTPValidationError"
                }
              }
            }
          }
        }
      }
    },
    "/api/v2/available_methods": {
      "get": {
        "tags": [
          "analysis-v2"
        ],
        "summary": "获取可用的分析方法（v2）",
        "description": "v2 版本：返回与 v1 相同的可用分析方法列表。",
        "operationId": "get_available_methods_v2_api_v2_available_methods_get",
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {
                  "additionalProperties": true,
                  "type": "object",
                  "title": "Response Get Available Methods V2 Api V2 Available Methods Get"
                }
              }
            }
          }
        }
      }
    },
    "/api/v2/analyze": {
      "post": {
        "tags": [
          "analysis-v2"
        ],
        "summary": "执行完整分析（v2：从 OSS URL 读取 CSV）",
        "description": "Analyze CSV from an OSS/URL, returning the same structure as v1.",
        "operationId": "analyze_data_v2_api_v2_analyze_post",
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/AnalysisV2Request"
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {
                  "additionalProperties": true,
                  "type": "object",
                  "title": "Response Analyze Data V2 Api V2 Analyze Post"
                }
              }
            }
          },
          "422": {
            "description": "Validation Error",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/HTTPValidationError"
                }
              }
            }
          }
        }
      }
    },
    "/api/image/{filename}": {
      "get": {
        "tags": [
          "files"
        ],
        "summary": "获取图片文件",
        "description": "获取可视化图片文件",
        "operationId": "serve_image_api_image__filename__get",
        "parameters": [
          {
            "name": "filename",
            "in": "path",
            "required": true,
            "schema": {
              "type": "string",
              "title": "Filename"
            }
          }
        ],
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {}
              }
            }
          },
          "422": {
            "description": "Validation Error",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/HTTPValidationError"
                }
              }
            }
          }
        }
      }
    },
    "/api/download/{filename}": {
      "get": {
        "tags": [
          "files"
        ],
        "summary": "下载文件",
        "description": "下载报告或其他文件",
        "operationId": "download_file_api_download__filename__get",
        "parameters": [
          {
            "name": "filename",
            "in": "path",
            "required": true,
            "schema": {
              "type": "string",
              "title": "Filename"
            }
          }
        ],
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {}
              }
            }
          },
          "422": {
            "description": "Validation Error",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/HTTPValidationError"
                }
              }
            }
          }
        }
      }
    },
    "/api/list_uploads": {
      "get": {
        "tags": [
          "files"
        ],
        "summary": "列出上传的文件",
        "description": "列出 uploads 目录中的文件",
        "operationId": "list_uploads_api_list_uploads_get",
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {}
              }
            }
          }
        }
      }
    },
    "/": {
      "get": {
        "summary": "Root",
        "description": "根路径",
        "operationId": "root__get",
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {}
              }
            }
          }
        }
      }
    },
    "/health": {
      "get": {
        "summary": "Health",
        "description": "健康检查",
        "operationId": "health_health_get",
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {}
              }
            }
          }
        }
      }
    },
    "/api/config": {
      "get": {
        "summary": "Get Config",
        "description": "获取应用配置",
        "operationId": "get_config_api_config_get",
        "responses": {
          "200": {
            "description": "Successful Response",
            "content": {
              "application/json": {
                "schema": {}
              }
            }
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "AnalysisRequest": {
        "properties": {
          "filename": {
            "type": "string",
            "title": "Filename"
          },
          "file_type": {
            "type": "string",
            "title": "File Type",
            "default": "csv"
          },
          "task_description": {
            "type": "string",
            "title": "Task Description",
            "default": "时间序列数据分析"
          },
          "data_background": {
            "additionalProperties": true,
            "type": "object",
            "title": "Data Background",
            "default": {}
          },
          "original_image": {
            "anyOf": [
              {
                "type": "string"
              },
              {
                "type": "null"
              }
            ],
            "title": "Original Image"
          },
          "language": {
            "type": "string",
            "title": "Language",
            "default": "zh"
          },
          "generate_plots": {
            "type": "boolean",
            "title": "Generate Plots",
            "default": false
          }
        },
        "type": "object",
        "required": [
          "filename"
        ],
        "title": "AnalysisRequest",
        "description": "分析请求模型"
      },
      "AnalysisV2Request": {
        "properties": {
          "oss_url": {
            "type": "string",
            "title": "Oss Url"
          },
          "task_description": {
            "type": "string",
            "title": "Task Description",
            "default": "时间序列数据分析"
          },
          "data_background": {
            "additionalProperties": true,
            "type": "object",
            "title": "Data Background",
            "default": {}
          },
          "language": {
            "type": "string",
            "title": "Language",
            "default": "zh"
          },
          "generate_plots": {
            "type": "boolean",
            "title": "Generate Plots",
            "default": false
          },
          "source_name": {
            "anyOf": [
              {
                "type": "string"
              },
              {
                "type": "null"
              }
            ],
            "title": "Source Name"
          }
        },
        "type": "object",
        "required": [
          "oss_url"
        ],
        "title": "AnalysisV2Request",
        "description": "v2 分析请求模型（输入为 OSS/URL）"
      },
      "Body_upload_file_api_upload_post": {
        "properties": {
          "file": {
            "type": "string",
            "format": "binary",
            "title": "File"
          },
          "task_description": {
            "type": "string",
            "title": "Task Description",
            "default": "时间序列数据分析"
          }
        },
        "type": "object",
        "required": [
          "file"
        ],
        "title": "Body_upload_file_api_upload_post"
      },
      "HTTPValidationError": {
        "properties": {
          "detail": {
            "items": {
              "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
          }
        },
        "type": "object",
        "title": "HTTPValidationError"
      },
      "UploadResponse": {
        "properties": {
          "success": {
            "type": "boolean",
            "title": "Success"
          },
          "filename": {
            "type": "string",
            "title": "Filename"
          },
          "file_type": {
            "type": "string",
            "title": "File Type"
          },
          "original_filename": {
            "type": "string",
            "title": "Original Filename"
          },
          "task_description": {
            "type": "string",
            "title": "Task Description"
          },
          "message": {
            "anyOf": [
              {
                "type": "string"
              },
              {
                "type": "null"
              }
            ],
            "title": "Message"
          }
        },
        "type": "object",
        "required": [
          "success",
          "filename",
          "file_type",
          "original_filename",
          "task_description"
        ],
        "title": "UploadResponse",
        "description": "上传响应模型"
      },
      "ValidationError": {
        "properties": {
          "loc": {
            "items": {
              "anyOf": [
                {
                  "type": "string"
                },
                {
                  "type": "integer"
                }
              ]
            },
            "type": "array",
            "title": "Location"
          },
          "msg": {
            "type": "string",
            "title": "Message"
          },
          "type": {
            "type": "string",
            "title": "Error Type"
          }
        },
        "type": "object",
        "required": [
          "loc",
          "msg",
          "type"
        ],
        "title": "ValidationError"
      }
    }
  }
 }
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,100 @@
 [project]
 name = "lazy-fjh"
 version = "2.0.0"
 description = "时间序列数据分析系统 - FastAPI 版本"
 readme = "README.md"
 requires-python = ">=3.10"
 authors = [{ name = "Your Name", email = "your.email@example.com" }]
 keywords = ["time-series", "data-analysis", "fastapi", "statistical-analysis"]
 classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Science/Research",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Operating System :: OS Independent",
 ]
 dependencies = [
    # FastAPI 和 Web 框架
    "fastapi>=0.104.1",
    "uvicorn[standard]>=0.24.0",
    "python-multipart>=0.0.6",
    "python-dotenv>=1.0.0",
    # 数据处理
    "pandas>=2.2.2",
    "numpy>=1.26.4",
    # 统计和科学计算
    "scipy>=1.13.0",
    "scikit-learn>=1.3.0",
    "statsmodels>=0.14.0",
    # 可视化
    "matplotlib>=3.7.2",
    "seaborn>=0.12.2",
    # 报告生成
    "reportlab>=4.0.4",
    "python-docx>=0.8.11",
    "python-pptx>=0.6.21",
    # API 和数据
    "openai>=1.3.0",
    "gradio_client>=0.9.0",
    "beautifulsoup4>=4.12.2",
    "requests>=2.31.0",
    # 系统和图像
    "psutil>=5.9.5",
    "Pillow>=10.0.0",
    "opencv-python>=4.8.1.78",
 ]
 [project.optional-dependencies]
 dev = [
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
    "mypy>=1.0.0",
 ]
 prod = ["gunicorn>=21.2.0", "supervisor>=4.2.5"]
 [build-system]
 requires = ["flit_core >=3.2,<4"]
 build-backend = "flit_core.buildapi"
 [tool.uv]
 dev-dependencies = [
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
 ]
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 python_files = "test_*.py"
 python_classes = "Test*"
 python_functions = "test_*"
 [tool.black]
 line-length = 100
 target-version = ["py310", "py311", "py312"]
 [tool.ruff]
 line-length = 100
 target-version = "py310"
 select = ["E", "F", "W", "I"]
 ignore = ["E501"]
 [tool.mypy]
 python_version = "3.10"
 warn_return_any = true
 warn_unused_configs = true
 disallow_untyped_defs = false
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,29 @@
 # FastAPI 和 Web 框架
 fastapi==0.104.1
 uvicorn[standard]==0.24.0
 python-multipart==0.0.6
 python-dotenv==1.0.0
 # 数据处理
 pandas==2.2.2
 numpy==1.26.4
 # 统计和科学计算
 scipy==1.13.0
 scikit-learn==1.3.0
 statsmodels==0.14.0
 # 可视化
 matplotlib==3.7.2
 seaborn==0.12.2
 # API 和数据
 openai==1.3.0
 gradio_client>=0.9.0
 requests==2.31.0
 # 系统和图像
 psutil==5.9.5
 # 生产部署
 gunicorn==21.2.0
--- a/resource/fonts/LICENSE.txt
+++ b/resource/fonts/LICENSE.txt
@ -0,0 +1,96 @@
 Copyright 2014-2021 Adobe (http://www.adobe.com/), with Reserved Font
 Name 'Source'. Source is a trademark of Adobe in the United States
 and/or other countries.
 This Font Software is licensed under the SIL Open Font License,
 Version 1.1.
 This license is copied below, and is also available with a FAQ at:
 http://scripts.sil.org/OFL
 -----------------------------------------------------------
 SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
 -----------------------------------------------------------
 PREAMBLE
 The goals of the Open Font License (OFL) are to stimulate worldwide
 development of collaborative font projects, to support the font
 creation efforts of academic and linguistic communities, and to
 provide a free and open framework in which fonts may be shared and
 improved in partnership with others.
 The OFL allows the licensed fonts to be used, studied, modified and
 redistributed freely as long as they are not sold by themselves. The
 fonts, including any derivative works, can be bundled, embedded,
 redistributed and/or sold with any software provided that any reserved
 names are not used by derivative works. The fonts and derivatives,
 however, cannot be released under any other type of license. The
 requirement for fonts to remain under this license does not apply to
 any document created using the fonts or their derivatives.
 DEFINITIONS
 "Font Software" refers to the set of files released by the Copyright
 Holder(s) under this license and clearly marked as such. This may
 include source files, build scripts and documentation.
 "Reserved Font Name" refers to any names specified as such after the
 copyright statement(s).
 "Original Version" refers to the collection of Font Software
 components as distributed by the Copyright Holder(s).
 "Modified Version" refers to any derivative made by adding to,
 deleting, or substituting -- in part or in whole -- any of the
 components of the Original Version, by changing formats or by porting
 the Font Software to a new environment.
 "Author" refers to any designer, engineer, programmer, technical
 writer or other person who contributed to the Font Software.
 PERMISSION & CONDITIONS
 Permission is hereby granted, free of charge, to any person obtaining
 a copy of the Font Software, to use, study, copy, merge, embed,
 modify, redistribute, and sell modified and unmodified copies of the
 Font Software, subject to the following conditions:
 1) Neither the Font Software nor any of its individual components, in
 Original or Modified Versions, may be sold by itself.
 2) Original or Modified Versions of the Font Software may be bundled,
 redistributed and/or sold with any software, provided that each copy
 contains the above copyright notice and this license. These can be
 included either as stand-alone text files, human-readable headers or
 in the appropriate machine-readable metadata fields within text or
 binary files as long as those fields can be easily viewed by the user.
 3) No Modified Version of the Font Software may use the Reserved Font
 Name(s) unless explicit written permission is granted by the
 corresponding Copyright Holder. This restriction only applies to the
 primary font name as presented to the users.
 4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
 Software shall not be used to promote, endorse or advertise any
 Modified Version, except to acknowledge the contribution(s) of the
 Copyright Holder(s) and the Author(s) or with their explicit written
 permission.
 5) The Font Software, modified or unmodified, in part or in whole,
 must be distributed entirely under this license, and must not be
 distributed under any other license. The requirement for fonts to
 remain under this license does not apply to any document created using
 the Font Software.
 TERMINATION
 This license becomes null and void if any of the above conditions are
 not met.
 DISCLAIMER
 THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
 OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
 COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
 DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
 OTHER DEALINGS IN THE FONT SOFTWARE.
--- a/resource/fonts/SourceHanSansCN.zip
+++ b/resource/fonts/SourceHanSansCN.zip
--- a/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Bold.otf
+++ b/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Bold.otf
--- a/resource/fonts/SubsetOTF/CN/SourceHanSansCN-ExtraLight.otf
+++ b/resource/fonts/SubsetOTF/CN/SourceHanSansCN-ExtraLight.otf
--- a/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Heavy.otf
+++ b/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Heavy.otf
--- a/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Light.otf
+++ b/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Light.otf
--- a/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Medium.otf
+++ b/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Medium.otf
--- a/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Normal.otf
+++ b/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Normal.otf
--- a/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Regular.otf
+++ b/resource/fonts/SubsetOTF/CN/SourceHanSansCN-Regular.otf
--- a/run.sh
+++ b/run.sh
@ -0,0 +1,126 @@
 #!/bin/bash
 # FastAPI 应用启动脚本 (使用 uv 包管理)
 set -e
 echo "========================================"
 echo "启动 FastAPI 时间序列分析系统 v2.0"
 echo "========================================"
 echo ""
 # 检查 uv
 if ! command -v /home/syy/.local/bin/uv &> /dev/null; then
    echo "错误: 未找到 uv"
    exit 1
 fi
 echo "✓ uv 已安装"
 echo ""
 # 检查虚拟环境，如果不存在则创建
 if [ ! -d ".venv" ]; then
    echo "创建虚拟环境..."
    /home/syy/.local/bin/uv venv --python 3.10
 fi
 # 激活虚拟环境
 echo "激活虚拟环境..."
 source .venv/bin/activate
 # 加载 .env（不覆盖已存在的环境变量）
 if [ -f ".env" ]; then
    echo "加载 .env..."
    while IFS=$'\t' read -r key quoted_value; do
        [ -z "$key" ] && continue
        # 仅允许合法的环境变量名
        if [[ ! "$key" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then
            continue
        fi
        # 已在环境中显式设置的变量优先
        if [ -z "${!key+x}" ]; then
            eval "export ${key}=${quoted_value}"
        fi
    done < <(python - <<'PY'
 import os
 import shlex
 try:
    from dotenv import dotenv_values
 except Exception:
    dotenv_values = None
 if dotenv_values is None:
    raise SystemExit(0)
 values = dotenv_values('.env')
 for k, v in values.items():
    if k is None or v is None:
        continue
    # 输出：KEY<TAB>shell_quoted_value
    print(f"{k}\t{shlex.quote(str(v))}")
 PY
 )
    echo "✓ .env 加载完成"
    echo ""
 fi
 # 检查并安装依赖
 echo "检查依赖..."
 python -c "import fastapi; import uvicorn; import pandas; import numpy" 2>/dev/null || {
    echo "安装依赖..."
    /home/syy/.local/bin/uv pip install \
        'fastapi>=0.104.1' \
        'uvicorn[standard]>=0.24.0' \
        'python-multipart>=0.0.6' \
        'python-dotenv>=1.0.0' \
        'pandas>=2.2.2' \
        'numpy>=1.26.4' \
        'scipy>=1.13.0' \
        'scikit-learn>=1.3.0' \
        'statsmodels>=0.14.0' \
        'matplotlib>=3.7.2' \
        'seaborn>=0.12.2' \
        'openai>=1.3.0' \
        'gradio_client>=0.9.0' \
        'requests>=2.31.0' \
        'psutil>=5.9.5'
 }
 echo "✓ 依赖检查完成"
 echo ""
 # 创建必要的目录
 mkdir -p uploads logs temp resource/fonts
 # 设置环境变量（如果没有设置）
 export ENV=${ENV:-"development"}
 export DEBUG=${DEBUG:-"False"}
 export HOST=${HOST:-"0.0.0.0"}
 export PORT=${PORT:-"60201"}
 export LOG_LEVEL=${LOG_LEVEL:-"INFO"}
 echo "环境配置:"
 echo "  ENV=$ENV"
 echo "  DEBUG=$DEBUG"
 echo "  HOST=$HOST"
 echo "  PORT=$PORT"
 echo "  LOG_LEVEL=$LOG_LEVEL"
 echo ""
 # 启动应用
 echo "启动应用..."
 echo ""
 echo "=================================="
 echo "✓ 访问地址: http://localhost:$PORT"
 echo "✓ API 文档: http://localhost:$PORT/docs"
 echo "✓ ReDoc: http://localhost:$PORT/redoc"
 echo "=================================="
 echo ""
 echo "按 Ctrl+C 停止应用"
 echo ""
 # 使用 uvicorn 运行
 python -m uvicorn app.main:app \
    --host $HOST \
    --port $PORT \
    --log-level $(echo $LOG_LEVEL | tr '[:upper:]' '[:lower:]')
--- a/run_analysis_on_test_data.py
+++ b/run_analysis_on_test_data.py
@ -0,0 +1,158 @@
 import os
 import sys
 import shutil
 import json
 from pathlib import Path
 import pandas as pd
 import numpy as np
 # Add project root to path
 sys.path.append(os.path.dirname(os.path.abspath(__file__)))
 from app.services.analysis_system import TimeSeriesAnalysisSystem
 from app.core.config import settings
 class NpEncoder(json.JSONEncoder):
    """
    JSON encoder that handles NumPy types
    """
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        if isinstance(obj, np.floating):
            return float(obj)
        if isinstance(obj, (np.bool_, bool)):
            return bool(obj)
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if isinstance(obj, pd.Timestamp):
            return str(obj)
        return super(NpEncoder, self).default(obj)
 def format_details(details):
    """
    Format detailed results for text output
    """
    if details is None:
        return ""
    # Handle pandas Series/DataFrame
    if isinstance(details, (pd.DataFrame, pd.Series)):
        try:
            return details.to_markdown() if hasattr(details, 'to_markdown') else details.to_string()
        except ImportError:
            return details.to_string()
    # Handle Dict/List (JSON-like)
    if isinstance(details, (dict, list)):
        try:
            return json.dumps(details, cls=NpEncoder, indent=2, ensure_ascii=False)
        except Exception as e:
            return f"JSON Serialization Error: {e}\nRaw: {str(details)}"
    return str(details)
 def run_all_analyses():
    # Setup paths
    base_dir = Path(__file__).parent
    test_dir = base_dir / "test"
    csv_filename = "comprehensive_test_data.csv"
    csv_path = test_dir / csv_filename
    if not csv_path.exists():
        print(f"Error: Test file not found at {csv_path}")
        return
    output_dir = test_dir / "results"
    output_dir.mkdir(exist_ok=True)
    print(f"Starting analysis on {csv_path}")
    print(f"Results will be saved to {output_dir}")
    # Initialize System
    # generate_plots=False allows skipping image generation but still returns full data details
    system = TimeSeriesAnalysisSystem(
        str(csv_path),
        task_description="Test Suite Analysis",
        language="zh",
        generate_plots=False 
    )
    if not system.load_and_preprocess_data():
        print("Failed to load data")
        return
    # Define methods to run
    methods = [
        ('statistical_overview', system.generate_statistical_overview),
        ('time_series_analysis', system.generate_time_series_plots),
        ('acf_pacf_analysis', system.generate_acf_pacf_plots),
        ('stationarity_tests', system.perform_stationarity_tests),
        ('normality_tests', system.perform_normality_tests),
        ('seasonal_decomposition', system.perform_seasonal_decomposition),
        ('spectral_analysis', system.perform_spectral_analysis),
        ('correlation_analysis', system.generate_correlation_heatmap),
        ('pca_scree_plot', system.generate_pca_scree_plot),
        ('pca_analysis', system.perform_pca_analysis),
        ('feature_importance', system.analyze_feature_importance),
        ('clustering_analysis', system.perform_clustering_analysis),
        ('factor_analysis', system.perform_factor_analysis),
        ('cointegration_test', system.perform_cointegration_test),
        ('var_analysis', system.perform_var_analysis)
    ]
    for name, method in methods:
        print(f"\nrunning {name}...")
        try:
            result = method()
            img_path = None
            summary = ""
            details = None
            # Parse result
            if isinstance(result, tuple):
                if len(result) == 3:
                    img_path, summary, details = result
                elif len(result) == 2:
                    img_path, summary = result
            else:
                 summary = str(result)
            # Save Output
            base_output_name = f"{name}_output"
            # 1. Save Summary & Details
            txt_path = output_dir / f"{base_output_name}.txt"
            with open(txt_path, "w", encoding="utf-8") as f:
                f.write(f"Method: {name}\n")
                f.write("-" * 50 + "\n")
                f.write("Summary:\n")
                f.write(str(summary))
                f.write("\n\n")
                if details is not None:
                    f.write("Detailed Results:\n")
                    f.write("-" * 50 + "\n")
                    formatted_details = format_details(details)
                    f.write(formatted_details)
                    f.write("\n")
            print(f"  Saved full details to {txt_path.name}")
            # 2. Save Image (if any)
            if img_path and os.path.exists(img_path):
                ext = os.path.splitext(img_path)[1]
                target_img_path = output_dir / f"{base_output_name}{ext}"
                shutil.copy2(img_path, target_img_path)
                print(f"  Saved image to {target_img_path.name}")
            else:
                pass # No image expected if generate_plots=False
        except Exception as e:
            print(f"  Error running {name}: {e}")
            import traceback
            traceback.print_exc()
 if __name__ == "__main__":
    run_all_analyses()
--- a/uv.lock
+++ b/uv.lock