API 文档
本文档介绍 OCR 评估框架的 API 接口。
核心模块
OCR评估框架
一个专业的OCR模型评估和对比框架,支持多种OCR模型的准确率测试和性能分析。
- class ocr_evaluation.Config(config_file: Path | None = None)[源代码]
基类:
object配置管理类
- DEFAULT_CONFIG = {'evaluation': {'accuracy_threshold': 0.95, 'case_sensitive': True, 'use_levenshtein': True}, 'logging': {'file': None, 'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s', 'level': 'INFO'}, 'models': {'paddleocr': {'lang': 'en', 'use_doc_orientation_classify': False, 'use_doc_unwarping': False, 'use_gpu': False, 'use_textline_orientation': False}, 'qwen_vl': {'lmstudio_url': 'ws://localhost:1234', 'max_tokens': 50, 'model_name': 'qwen/qwen2.5-vl-7b', 'temperature': 0.1}}, 'output': {'report_format': 'markdown', 'reports_dir': 'data/reports', 'results_dir': 'data/outputs'}}
- class ocr_evaluation.ModelTypes[源代码]
基类:
object支持的模型类型
- PADDLEOCR = 'paddleocr'
- QWEN_VL = 'qwen_vl'
- class ocr_evaluation.BaseEvaluator(config: Dict[str, Any] | None = None)[源代码]
基类:
ABCOCR评估器基类
定义了所有OCR模型评估器必须实现的接口
- calculate_accuracy(ground_truth: str, predicted: str) float[源代码]
计算准确率
- 参数:
ground_truth – 标准答案
predicted – 预测结果
- 返回:
准确率 (0.0 到 1.0)
- 返回类型:
- evaluate_directory(directory: Path) DirectoryResult | None[源代码]
评估单个目录
- 参数:
directory – 包含图片和Label.txt的目录
- 返回:
目录评估结果,如果失败返回None
- 返回类型:
- evaluate_dataset(images_dir: Path) TestSummary | None[源代码]
评估完整数据集
- 参数:
images_dir – 包含所有测试图片目录的根目录
- 返回:
完整测试汇总结果
- 返回类型:
- class ocr_evaluation.EvaluationResult(image_path: Path, ground_truth: str, predicted: str, accuracy: float, exact_match: bool, metadata: Dict[str, Any] | None = None)[源代码]
基类:
object单个样本评估结果
- class ocr_evaluation.DirectoryResult(directory: Path, total_images: int, average_accuracy: float, exact_match_count: int, exact_match_rate: float, results: List[EvaluationResult], metadata: Dict[str, Any] | None = None)[源代码]
基类:
object目录评估结果
- results: List[EvaluationResult]
- class ocr_evaluation.TestSummary(model_name: str, test_timestamp: str, total_images: int, overall_accuracy: float, overall_exact_match_rate: float, directory_results: List[DirectoryResult], technical_details: Dict[str, Any], metadata: Dict[str, Any] | None = None)[源代码]
基类:
object完整测试汇总结果
- directory_results: List[DirectoryResult]
- class ocr_evaluation.PaddleOCREvaluator(config: Dict[str, Any] | None = None)[源代码]
-
PaddleOCR评估器
- recognize_image(image_path: Path) str[源代码]
使用PaddleOCR识别单张图片
- 参数:
image_path – 图片路径
- 返回:
识别的文本结果
- 返回类型:
- classmethod create_with_optimal_config() PaddleOCREvaluator[源代码]
使用最佳配置创建评估器实例
- 返回:
使用优化配置的评估器实例
- 返回类型:
- class ocr_evaluation.QwenVLEvaluator(config: Dict[str, Any] | None = None)[源代码]
-
Qwen2.5-VL多模态模型评估器
- recognize_image(image_path: Path) str[源代码]
使用Qwen2.5-VL识别单张图片
- 参数:
image_path – 图片路径
- 返回:
识别的文本结果
- 返回类型:
- classmethod create_with_default_config() QwenVLEvaluator[源代码]
使用默认配置创建评估器实例
- 返回:
使用默认配置的评估器实例
- 返回类型:
- ocr_evaluation.create_evaluator(model_type: str, config: dict | None = None) BaseEvaluator[源代码]
创建指定类型的评估器
- 参数:
model_type – 模型类型名称
config – 模型配置参数
- 返回:
评估器实例
- 返回类型:
- 抛出:
ValueError – 不支持的模型类型
- class ocr_evaluation.ReportGenerator(output_dir: Path | None = None)[源代码]
基类:
object报告生成器
- generate_markdown_report(summary: TestSummary) str[源代码]
生成Markdown格式报告
- 参数:
summary – 测试汇总结果
- 返回:
Markdown格式的报告内容
- 返回类型:
- ocr_evaluation.get_logger(name: str | None = None) Logger[源代码]
获取全局日志器
- 参数:
name – 子日志器名称
- 返回:
日志器实例
- 返回类型:
- ocr_evaluation.evaluate_model(model_type: str, images_dir: str, config: dict | None = None) TestSummary[源代码]
便捷的模型评估函数
- 参数:
model_type – 模型类型 (‘paddleocr’ 或 ‘qwen_vl’)
images_dir – 图片目录路径
config – 可选的模型配置
- 返回:
评估结果汇总
- 返回类型:
示例
>>> from ocr_evaluation import evaluate_model >>> summary = evaluate_model('paddleocr', './images') >>> print(f"准确率: {summary.overall_accuracy:.2%}")
- ocr_evaluation.generate_report(summary: TestSummary, output_dir: str = './reports', format: str = 'markdown') str[源代码]
便捷的报告生成函数
- 参数:
summary – 评估结果汇总
output_dir – 输出目录
format – 报告格式 (‘markdown’ 或 ‘json’)
- 返回:
生成的报告文件路径
- 返回类型:
示例
>>> from ocr_evaluation import evaluate_model, generate_report >>> summary = evaluate_model('paddleocr', './images') >>> report_path = generate_report(summary) >>> print(f"报告已保存至: {report_path}")
配置模块
OCR评估框架配置模块
- class ocr_evaluation.config.Config(config_file: Path | None = None)[源代码]
基类:
object配置管理类
- DEFAULT_CONFIG = {'evaluation': {'accuracy_threshold': 0.95, 'case_sensitive': True, 'use_levenshtein': True}, 'logging': {'file': None, 'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s', 'level': 'INFO'}, 'models': {'paddleocr': {'lang': 'en', 'use_doc_orientation_classify': False, 'use_doc_unwarping': False, 'use_gpu': False, 'use_textline_orientation': False}, 'qwen_vl': {'lmstudio_url': 'ws://localhost:1234', 'max_tokens': 50, 'model_name': 'qwen/qwen2.5-vl-7b', 'temperature': 0.1}}, 'output': {'report_format': 'markdown', 'reports_dir': 'data/reports', 'results_dir': 'data/outputs'}}
- class ocr_evaluation.config.ModelTypes[源代码]
基类:
object支持的模型类型
- PADDLEOCR = 'paddleocr'
- QWEN_VL = 'qwen_vl'
- class ocr_evaluation.config.PaddleOCRConstants[源代码]
基类:
objectPaddleOCR相关常量
- DEFAULT_LANG = 'en'
- SUPPORTED_LANGS = ['ch', 'en', 'fr', 'german', 'korean', 'japan']
- OPTIMIZED_CONFIG = {'lang': 'en', 'use_doc_orientation_classify': False, 'use_doc_unwarping': False, 'use_textline_orientation': False}
- class ocr_evaluation.config.QwenConstants[源代码]
基类:
objectQwen2.5-VL相关常量
- DEFAULT_MODEL_NAME = 'qwen/qwen2.5-vl-7b'
- DEFAULT_LMSTUDIO_URL = 'ws://localhost:1234'
- DEFAULT_TEMPERATURE = 0.1
- DEFAULT_MAX_TOKENS = 50
- DEFAULT_PROMPT = 'Please look at this image carefully and extract the exact text/code shown in the image. This appears to be an alphanumeric code or product number. Please provide ONLY the exact text you see, without any additional explanation or formatting. The text typically consists of letters, numbers, and may include symbols like # or .'
- class ocr_evaluation.config.EvaluationConstants[源代码]
基类:
object评估相关常量
- DEFAULT_ACCURACY_THRESHOLD = 0.95
- ACCURACY_RANGES = {'0.6-0.7': (0.6, 0.7), '0.7-0.8': (0.7, 0.8), '0.8-0.9': (0.8, 0.9), '0.9-1.0': (0.9, 1.0), '<0.6': (0.0, 0.6)}
- class ocr_evaluation.config.TextProcessingConstants[源代码]
基类:
object文本处理相关常量
- ALPHANUMERIC_PATTERN = re.compile('[A-Z0-9]+[#.\\-A-Z0-9]*')
- EXPLANATION_PREFIXES = ['The text shown in the image is:', 'The code in the image is:', 'The text appears to be:', 'I can see:', 'The image shows:', 'The alphanumeric code is:', 'The product number is:', 'Looking at this image, I can see:']
- QUOTE_CHARS = '"\'`'
- class ocr_evaluation.config.LoggingConstants[源代码]
基类:
object日志相关常量
- DEFAULT_FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
- DEFAULT_LEVEL = 'INFO'
- SUPPORTED_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
- class ocr_evaluation.config.ReportConstants[源代码]
基类:
object报告生成相关常量
- MARKDOWN_EXTENSION = '.md'
- JSON_EXTENSION = '.json'
- HTML_EXTENSION = '.html'
- DEFAULT_REPORT_NAME_TEMPLATE = '{model}_准确率报告_{timestamp}'
- DEFAULT_RESULTS_NAME_TEMPLATE = '{model}_results_{timestamp}'
- REPORT_TIMESTAMP_FORMAT = '%Y-%m-%d_%H-%M-%S'
- DISPLAY_TIMESTAMP_FORMAT = '%Y-%m-%d %H:%M:%S'
- class ocr_evaluation.config.ErrorCodes[源代码]
基类:
object错误码定义
- SUCCESS = 0
- CONFIG_ERROR = 1
- MODEL_INIT_ERROR = 2
- DATA_ERROR = 3
- EVALUATION_ERROR = 4
- REPORT_ERROR = 5
- UNKNOWN_ERROR = 99
设置模块
OCR评估框架配置管理模块
- class ocr_evaluation.config.settings.Config(config_file: Path | None = None)[源代码]
基类:
object配置管理类
- DEFAULT_CONFIG = {'evaluation': {'accuracy_threshold': 0.95, 'case_sensitive': True, 'use_levenshtein': True}, 'logging': {'file': None, 'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s', 'level': 'INFO'}, 'models': {'paddleocr': {'lang': 'en', 'use_doc_orientation_classify': False, 'use_doc_unwarping': False, 'use_gpu': False, 'use_textline_orientation': False}, 'qwen_vl': {'lmstudio_url': 'ws://localhost:1234', 'max_tokens': 50, 'model_name': 'qwen/qwen2.5-vl-7b', 'temperature': 0.1}}, 'output': {'report_format': 'markdown', 'reports_dir': 'data/reports', 'results_dir': 'data/outputs'}}
常量模块
OCR评估框架常量定义
- class ocr_evaluation.config.constants.ModelTypes[源代码]
基类:
object支持的模型类型
- PADDLEOCR = 'paddleocr'
- QWEN_VL = 'qwen_vl'
- class ocr_evaluation.config.constants.PaddleOCRConstants[源代码]
基类:
objectPaddleOCR相关常量
- DEFAULT_LANG = 'en'
- SUPPORTED_LANGS = ['ch', 'en', 'fr', 'german', 'korean', 'japan']
- OPTIMIZED_CONFIG = {'lang': 'en', 'use_doc_orientation_classify': False, 'use_doc_unwarping': False, 'use_textline_orientation': False}
- class ocr_evaluation.config.constants.QwenConstants[源代码]
基类:
objectQwen2.5-VL相关常量
- DEFAULT_MODEL_NAME = 'qwen/qwen2.5-vl-7b'
- DEFAULT_LMSTUDIO_URL = 'ws://localhost:1234'
- DEFAULT_TEMPERATURE = 0.1
- DEFAULT_MAX_TOKENS = 50
- DEFAULT_PROMPT = 'Please look at this image carefully and extract the exact text/code shown in the image. This appears to be an alphanumeric code or product number. Please provide ONLY the exact text you see, without any additional explanation or formatting. The text typically consists of letters, numbers, and may include symbols like # or .'
- class ocr_evaluation.config.constants.EvaluationConstants[源代码]
基类:
object评估相关常量
- DEFAULT_ACCURACY_THRESHOLD = 0.95
- ACCURACY_RANGES = {'0.6-0.7': (0.6, 0.7), '0.7-0.8': (0.7, 0.8), '0.8-0.9': (0.8, 0.9), '0.9-1.0': (0.9, 1.0), '<0.6': (0.0, 0.6)}
- class ocr_evaluation.config.constants.TextProcessingConstants[源代码]
基类:
object文本处理相关常量
- ALPHANUMERIC_PATTERN = re.compile('[A-Z0-9]+[#.\\-A-Z0-9]*')
- EXPLANATION_PREFIXES = ['The text shown in the image is:', 'The code in the image is:', 'The text appears to be:', 'I can see:', 'The image shows:', 'The alphanumeric code is:', 'The product number is:', 'Looking at this image, I can see:']
- QUOTE_CHARS = '"\'`'
- class ocr_evaluation.config.constants.LoggingConstants[源代码]
基类:
object日志相关常量
- DEFAULT_FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
- DEFAULT_LEVEL = 'INFO'
- SUPPORTED_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
- class ocr_evaluation.config.constants.ReportConstants[源代码]
基类:
object报告生成相关常量
- MARKDOWN_EXTENSION = '.md'
- JSON_EXTENSION = '.json'
- HTML_EXTENSION = '.html'
- DEFAULT_REPORT_NAME_TEMPLATE = '{model}_准确率报告_{timestamp}'
- DEFAULT_RESULTS_NAME_TEMPLATE = '{model}_results_{timestamp}'
- REPORT_TIMESTAMP_FORMAT = '%Y-%m-%d_%H-%M-%S'
- DISPLAY_TIMESTAMP_FORMAT = '%Y-%m-%d %H:%M:%S'
- class ocr_evaluation.config.constants.ErrorCodes[源代码]
基类:
object错误码定义
- SUCCESS = 0
- CONFIG_ERROR = 1
- MODEL_INIT_ERROR = 2
- DATA_ERROR = 3
- EVALUATION_ERROR = 4
- REPORT_ERROR = 5
- UNKNOWN_ERROR = 99
模型模块
基础评估器
OCR评估器抽象基类
- class ocr_evaluation.models.base.EvaluationResult(image_path: Path, ground_truth: str, predicted: str, accuracy: float, exact_match: bool, metadata: Dict[str, Any] | None = None)[源代码]
基类:
object单个样本评估结果
- class ocr_evaluation.models.base.DirectoryResult(directory: Path, total_images: int, average_accuracy: float, exact_match_count: int, exact_match_rate: float, results: List[EvaluationResult], metadata: Dict[str, Any] | None = None)[源代码]
基类:
object目录评估结果
- results: List[EvaluationResult]
- class ocr_evaluation.models.base.TestSummary(model_name: str, test_timestamp: str, total_images: int, overall_accuracy: float, overall_exact_match_rate: float, directory_results: List[DirectoryResult], technical_details: Dict[str, Any], metadata: Dict[str, Any] | None = None)[源代码]
基类:
object完整测试汇总结果
- directory_results: List[DirectoryResult]
- class ocr_evaluation.models.base.BaseEvaluator(config: Dict[str, Any] | None = None)[源代码]
基类:
ABCOCR评估器基类
定义了所有OCR模型评估器必须实现的接口
- calculate_accuracy(ground_truth: str, predicted: str) float[源代码]
计算准确率
- 参数:
ground_truth – 标准答案
predicted – 预测结果
- 返回:
准确率 (0.0 到 1.0)
- 返回类型:
- evaluate_directory(directory: Path) DirectoryResult | None[源代码]
评估单个目录
- 参数:
directory – 包含图片和Label.txt的目录
- 返回:
目录评估结果,如果失败返回None
- 返回类型:
- evaluate_dataset(images_dir: Path) TestSummary | None[源代码]
评估完整数据集
- 参数:
images_dir – 包含所有测试图片目录的根目录
- 返回:
完整测试汇总结果
- 返回类型:
PaddleOCR 评估器
PaddleOCR评估器实现
- class ocr_evaluation.models.paddleocr_evaluator.PaddleOCREvaluator(config: Dict[str, Any] | None = None)[源代码]
-
PaddleOCR评估器
- recognize_image(image_path: Path) str[源代码]
使用PaddleOCR识别单张图片
- 参数:
image_path – 图片路径
- 返回:
识别的文本结果
- 返回类型:
- classmethod create_with_optimal_config() PaddleOCREvaluator[源代码]
使用最佳配置创建评估器实例
- 返回:
使用优化配置的评估器实例
- 返回类型:
Qwen 评估器
Qwen2.5-VL评估器实现
- class ocr_evaluation.models.qwen_evaluator.QwenVLEvaluator(config: Dict[str, Any] | None = None)[源代码]
-
Qwen2.5-VL多模态模型评估器
- recognize_image(image_path: Path) str[源代码]
使用Qwen2.5-VL识别单张图片
- 参数:
image_path – 图片路径
- 返回:
识别的文本结果
- 返回类型:
- classmethod create_with_default_config() QwenVLEvaluator[源代码]
使用默认配置创建评估器实例
- 返回:
使用默认配置的评估器实例
- 返回类型:
工具模块
日志工具
日志工具模块
- class ocr_evaluation.utils.logging_utils.ColoredFormatter(fmt=None, datefmt=None, style='%', validate=True)[源代码]
基类:
Formatter带颜色的日志格式化器(仅在控制台使用)
- COLORS = {'CRITICAL': '\x1b[35m', 'DEBUG': '\x1b[36m', 'ERROR': '\x1b[31m', 'INFO': '\x1b[32m', 'WARNING': '\x1b[33m'}
- RESET = '\x1b[0m'
- format(record)[源代码]
Format the specified record as text.
The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
- class ocr_evaluation.utils.logging_utils.OCRLogger(name: str = 'ocr_evaluation')[源代码]
基类:
objectOCR评估框架统一日志管理器
- class ocr_evaluation.utils.logging_utils.LogContextManager(logger: Logger, level: int)[源代码]
基类:
object日志上下文管理器,用于临时改变日志级别
- class ocr_evaluation.utils.logging_utils.ProgressLogger(logger: Logger, total: int)[源代码]
基类:
object进度日志器,用于显示评估进度
- ocr_evaluation.utils.logging_utils.get_logger(name: str | None = None) Logger[源代码]
获取全局日志器
- 参数:
name – 子日志器名称
- 返回:
日志器实例
- 返回类型:
- ocr_evaluation.utils.logging_utils.setup_logging(config: Dict[str, Any])[源代码]
设置全局日志配置
- 参数:
config – 日志配置
- ocr_evaluation.utils.logging_utils.with_log_level(logger: Logger, level: int) LogContextManager[源代码]
临时改变日志级别的上下文管理器
- 参数:
logger – 日志器
level – 临时级别
- 返回:
上下文管理器
- 返回类型:
- ocr_evaluation.utils.logging_utils.create_progress_logger(logger: Logger, total: int) ProgressLogger[源代码]
创建进度日志器
- 参数:
logger – 底层日志器
total – 总任务数
- 返回:
进度日志器
- 返回类型:
报告生成器
报告生成工具
- class ocr_evaluation.utils.report_generator.ReportGenerator(output_dir: Path | None = None)[源代码]
基类:
object报告生成器
- generate_markdown_report(summary: TestSummary) str[源代码]
生成Markdown格式报告
- 参数:
summary – 测试汇总结果
- 返回:
Markdown格式的报告内容
- 返回类型:
命令行接口
主程序
OCR评估框架主CLI入口
- ocr_evaluation.cli.main.create_cli_parser() ArgumentParser[源代码]
创建CLI参数解析器
- ocr_evaluation.cli.main.show_help_hint(parser: ArgumentParser)[源代码]
显示帮助提示
命令模块
CLI命令实现
- class ocr_evaluation.cli.commands.BaseCommand[源代码]
基类:
object命令基类
- add_common_arguments(parser: ArgumentParser)[源代码]
添加通用参数
- class ocr_evaluation.cli.commands.EvaluateCommand[源代码]
基类:
BaseCommand评估命令
- static add_arguments(parser: ArgumentParser)[源代码]
添加评估命令参数
- class ocr_evaluation.cli.commands.CompareCommand[源代码]
基类:
BaseCommand模型对比命令
- static add_arguments(parser: ArgumentParser)[源代码]
添加对比命令参数
- class ocr_evaluation.cli.commands.ConfigCommand[源代码]
基类:
BaseCommand配置管理命令
- static add_arguments(parser: ArgumentParser)[源代码]
添加配置命令参数
使用示例
基本使用
from ocr_evaluation import OCREvaluator
from ocr_evaluation.config.settings import Settings
# 加载配置
settings = Settings.from_yaml("config.yaml")
# 创建评估器
evaluator = OCREvaluator(settings)
# 运行评估
results = evaluator.evaluate()
# 生成报告
evaluator.generate_report(results)
自定义评估器
from ocr_evaluation.models.base import BaseEvaluator
class CustomEvaluator(BaseEvaluator):
def __init__(self, config):
super().__init__(config)
# 初始化自定义模型
def recognize_text(self, image_path):
# 实现文本识别逻辑
return recognized_text
def evaluate_single(self, image_path, ground_truth):
# 实现单张图片评估逻辑
return evaluation_result
配置管理
from ocr_evaluation.config.settings import Settings
# 从文件加载配置
settings = Settings.from_yaml("config.yaml")
# 从字典创建配置
config_dict = {
"data": {
"images_dir": "data/images",
"ground_truth_file": "data/ground_truth.json"
},
"evaluators": [
{
"name": "paddleocr",
"type": "paddleocr",
"config": {"lang": "ch"}
}
]
}
settings = Settings.from_dict(config_dict)
# 保存配置
settings.save_yaml("new_config.yaml")
日志配置
from ocr_evaluation.utils.logging_utils import setup_logging
# 设置日志
logger = setup_logging(
level="INFO",
log_file="evaluation.log",
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
# 使用日志
logger.info("开始评估")
logger.error("评估失败")
报告生成
from ocr_evaluation.utils.report_generator import ReportGenerator
# 创建报告生成器
generator = ReportGenerator()
# 生成 HTML 报告
generator.generate_html_report(results, "report.html")
# 生成 JSON 报告
generator.generate_json_report(results, "report.json")
# 生成 CSV 报告
generator.generate_csv_report(results, "report.csv")