项目作者: gumblex

项目描述 :
Export Telegram messages.
高级语言: Python
项目地址: git://github.com/gumblex/tg-export.git
创建时间: 2015-10-13T05:01:17Z
项目社区:https://github.com/gumblex/tg-export

开源协议:GNU Lesser General Public License v3.0

下载


tg-export

Deprecation notice: Since tg-cli is not maintained and becomes unusable, this project is deprecated. Please use alternatives like telegram-export instead.

Export Telegram messages, using telegram-cli. Patched version recommended.

This version (v3) is compatible with vysheng/tg/master AND vysheng/tg/test
branches.

Note: The database format of this version (v3) is not compatible with the old ones.
To convert old databases (v1 or v2), run python3 dbconvert.py [old.db [new.db]]

export.py

  1. $ python3 export.py -h
  2. usage: export.py [-h] [-o OUTPUT] [-d DB] [-f] [-p PEER] [-B] [-t TIMEOUT]
  3. [-l] [-L] [-e TGBIN] [-v]
  4. Export Telegram messages.
  5. optional arguments:
  6. -h, --help show this help message and exit
  7. -o OUTPUT, --output OUTPUT
  8. output path
  9. -d DB, --db DB database path
  10. -f, --force force download all messages
  11. -p PEER, --peer PEER only download messages for this peer (format:
  12. channel#id1001234567, or use partial name/title as
  13. shown in tgcli)
  14. -B, --batch-only fetch messages in batch only, don't try to get more
  15. missing messages
  16. -t TIMEOUT, --timeout TIMEOUT
  17. tg-cli command timeout
  18. -l, --logging logging mode (keep running)
  19. -L, --keep-logging first export, then keep logging
  20. -e TGBIN, --tgbin TGBIN
  21. telegram-cli binary path
  22. -v, --verbose print debug messages

Lots of workaround about the unreliability of tg-cli is included (in this script and tgcli.py), so the script itself may be unreliable as well.

Common problems with tg-cli are:

  • Dies arbitrarily.
  • No response in the socket interface.
  • Slow response in the socket interface.
  • Half response in the socket interface, while the another half appears after the timeout.
  • Returns an empty array when actually there are remaining messages.

Note: When it’s trying to get the remaining messages, the telegram-cli will crash like crazy. That’s due to non-existent messages. For a quick fix, use this fork of tg-cli.

Which is called NO WARRANTY™.

logfmt.py

This script can process database written by export.py or tg-chatdig, and write out a human-readable format (txt, html, etc.) according to a jinja2 template.

  1. usage: logfmt.py [-h] [-o OUTPUT] [-d DB] [-b BOTDB] [-D BOTDB_DEST] [-u]
  2. [-t TEMPLATE] [-P PEER_PRINT] [-l LIMIT] [-L HARDLIMIT]
  3. [-c CACHEDIR] [-r URLPREFIX]
  4. peer
  5. Format exported database file into human-readable format.
  6. positional arguments:
  7. peer export certain peer id or tg-cli-style peer print name
  8. optional arguments:
  9. -h, --help show this help message and exit
  10. -o OUTPUT, --output OUTPUT
  11. output path
  12. -d DB, --db DB tg-export database path
  13. -b BOTDB, --botdb BOTDB
  14. tg-chatdig bot database path
  15. -D BOTDB_DEST, --botdb-dest BOTDB_DEST
  16. tg-chatdig bot logged chat id or tg-cli-style peer
  17. name
  18. -u, --botdb-user use user information in tg-chatdig database first
  19. -t TEMPLATE, --template TEMPLATE
  20. export template, can be 'txt'(default), 'html',
  21. 'json', or template file name
  22. -P PEER_PRINT, --peer-print PEER_PRINT
  23. set print name for the peer
  24. -l LIMIT, --limit LIMIT
  25. limit the number of fetched messages and set the
  26. offset
  27. -L HARDLIMIT, --hardlimit HARDLIMIT
  28. set a hard limit of the number of messages, must be
  29. used with -l
  30. -c CACHEDIR, --cachedir CACHEDIR
  31. the path of media files
  32. -r URLPREFIX, --urlprefix URLPREFIX
  33. the url prefix of media files

tgcli.py

Simple wrapper for telegram-cli interface.

Example:

  1. tgcli = TelegramCliInterface('../tg/bin/telegram-cli')
  2. dialogs = tgcli.cmd_dialog_list()

TelegramCliInterface(cmd, extra_args=(), run=True)

  • run() starts the subprocess, needed when object created with run=False.
  • send_command(cmd, timeout=180, resync=True) sends a command to tg-cli. use resync for consuming text since last timeout.
  • cmd_*(*args, **kwargs) is the convenience method to send a command and get response. args are for the command, kwargs are arguments for TelegramCliInterface.send_command.
  • on_info(text)(callback) is called when a line of text is printed on stdout.
  • on_json(obj)(callback) is called with the interpreted object when a line of json is printed on stdout.
  • on_text(text)(callback) is called when a line of anything is printed on stdout.
  • on_start()(callback) is called after telegram-cli starts.
  • on_exit()(callback) is called after telegram-cli dies.
  • close() properly ends the subprocess.

do_nothing() function does nothing. (for callbacks)

TelegramCliExited exception is raised if telegram-cli dies when reading an answer.

License

Now it’s LGPLv3+.