Tex>> cdf>> 返回
项目作者: attwad

项目描述 :
Worker and elasticsearch for automated College de France audio transcripts
高级语言: Go
项目地址: git://github.com/attwad/cdf.git
创建时间: 2017-07-16T07:05:26Z
项目社区:https://github.com/attwad/cdf

开源协议:MIT License

下载


College de France automated audio transcripts

Worker and elasticsearch for automated College de France audio transcripts

Build Status
GoDoc
Go Report Card

Worker

The worker periodically polls datastore for scheduled transcriptions, if any it downloads the mp3 files
from the College de France website, converts them to FLAC, stores them in a Google Storage bucket,
sends a Speech to Text request, stores the transcription in the same storage bucket, and index the transcripts
in an elasticsearch instance running in the same Kubernetes cluster.

A periodic job also runs to compute overall statistics about the transcriptions due to limitations of the datastore
in this regard.

Elasticsearch

Elasticsearch runs as a single (thus “yellow”) master&data node in a Kubernetes cluster, it does full text indexing of
the transcripts using the French analyzer.