项目作者: innaky

项目描述 :
Parsing a docx file with python
高级语言: Python
项目地址: git://github.com/innaky/docx-parsing-python.git
创建时间: 2019-02-07T00:37:10Z
项目社区:https://github.com/innaky/docx-parsing-python

开源协议:MIT License

下载


Parsing docx using python 2.7

Parsing a docx file with python 2.7

Explanation

The docx file has hundreds of lines.
It has information (in variable format) that is separated by a line with
symbols “———“.
This script extract all intermediate sections to the boundary lines and save
this section in a text file, the name of this file is Header (variable format)
with the time (ex. 2018-01-31) the rest is data

Example

Docx file

  1. -----------------------
  2. Header date:2018-01-31T00:00:00
  3. Information
  4. -----------------------
  5. Header date:2018-08-23T00:00:00
  6. I
  7. n
  8. f
  9. or
  10. m
  11. a
  12. t
  13. i
  14. o
  15. n
  16. -----------------------

Output

  1. filename: Header-Time.txt
  2. Information