项目作者: hiejulia

项目描述 :
ML AI DL
高级语言: Jupyter Notebook
项目地址: git://github.com/hiejulia/Machine-Learning---Deep-Learning---AI.git


machine_learning_project

Algorithms

  • Deep learning
  • Ensemble
  • Neural networks
  • Regression
  • Decision Tree
  • Bayesian
  • Regularization
  • Rule system
  • Dimension Reduction
  • Instanced based
  • Clustering

Deep Learning

Neural network architecture
  • DNN
  • CNN
  • RNN
    • LSTM, GRU, Bidirectional LSTM
  • EA

AI

  • AI search algorithms : Dijktra search, A* search
  • AI game and Rule-based system
Frameworks
  • Tensorflow
  • Keras
  • Theano
  • Neon
  • Pytorch
  • Caffe
  • MXnet
  • Microsoft Cognitive Toolkit
  • DeepLearning4J

    Cloud based platforms for DL

  • AWS , Azure, GCP, NVIDIA GPU Cloud
  • AMI : Ec2 - These AMIs come pre-installed with deep learning frameworks, such as TensorFlow, Gluon, and Apache MXNet, that are optimized for the NVIDIA Volta V100 GPUs within Amazon EC2 P3 instances
    • AML : model building feebatch prediction, Real time prediction
  • Google cloud ML Engine

Big data ML

Big Data Machine Learning
General Big Data Framework:
Big data cluster deployments frameworks
HortonWorks Data Platform (HDP)
Cloudera CDH
Amazon Elastic MapReduce (EMR)
Microsoft HDInsight
Data acquisition:
Publish-subscribe framework
Source-sink framework
SQL framework
Message queueing framework
Custom framework
Data storage:
Hadoop Distributed File System (HDFS)
NoSQL
Data processing and preparation:
Hive and Hive Query Language (HQL)
Spark SQL
Amazon Redshift
Real-time stream processing
Machine Learning
Visualization and analysis
Batch Big Data Machine Learning
H2O:
H2O architecture
Machine learning in H2O
Tools and usage
Case study
Business problems
Machine Learning mapping
Data collection
Data sampling and transformation
Experiments, results, and analysis
Spark MLlib:
Spark architecture
Machine Learning in MLlib
Tools and usage
Experiments, results, and analysis
Real-time Big Data Machine Learning
Scalable Advanced Massive Online Analysis (SAMOA):
SAMOA architecture
Machine Learning algorithms
Tools and usage
Experiments, results, and analysis
The future of Machine Learning

Production pipeline - Big data

  • Cluster deployment framework : HDP, Cloudera , Amazon Elastic MapReduce,Microsoft Azure HDInsight

  • Data acquisition :

    • Publish-subscribe frameworks, Source-sink frameworks,
  • Datastorage
    • HDFS, NoSQL
  • Data pocessing & preparation
    • Data cleansing: Involves everything from correcting errors, type matching, normalization of elements, and so on, on the raw data.
      Data scraping and curating: Converting data elements and normalizing the data from one structure to another.
      Data transformation: Many analytical algorithms need features that are aggregates built on raw or historical data. Transforming and computing those extra features are done in this step
    • Hive HSQ, SparkSQL, Amazon Redshift MPP, Real-time stream processing

Big data ML platform

  • H2O ARCHITECTURE
    • fork-join
    • MapReduce
  • @jamal.robinson/introduction-to-h2o-ai-1ba51a884f02"">https://medium.com/@jamal.robinson/introduction-to-h2o-ai-1ba51a884f02

Funding by AI category
  • ML apps
  • NLP
  • Computer Vision
  • Smart robot
  • Virtual personal assistant
  • Gesture control
  • Speech recognition
  • Recommendation engine
  • Video content recognition
  • Context aware computing
  • Speech to speech translation
Tuning methods for DL networks
  • Back propagation
  • Learning rate decay
  • Max pooling
  • Long short term memory
  • Continuous bag of words
  • Transfer learning
  • Skipgram
  • Batch normalization
  • Dropout
  • Stochastic gradient descent

AI engine with DL libs

  • Theano
  • Tensorflow
  • CNTK
  • CaffeDL4L
  • Torch
  • SparkML Lib : fast and engine for large scale distributed data processing
  • Apache MXNet : state of the art model CNN and LSTM - Scalable
  • keras :

Public datasets

  • Image : Open Images V4 Google, Microsoft , UC Berkeley
  • Video : Youtube
  • Text : Squad, Yelp
  • Satellite data : Landsat data
  • Audio : Google Audio Set, Librispeech