项目作者: mchmarny

项目描述 :
Using tweeter sentiment and stock market price signal correlation to predict next day closing price
高级语言:
项目地址: git://github.com/mchmarny/stocker.git
创建时间: 2019-05-06T00:08:08Z
项目社区:https://github.com/mchmarny/stocker

开源协议:Apache License 2.0

下载


stocker

Using tweeter sentiment and stock market price signal correlation to predict next day closing price.

Note, this is for demonstration purposes only. I know literally nothing about the stock market. You should not use this demo to make or support your financial decisions. DON’T DO IT!

Dependant components

Once you get the data flow configured using these components, you can follow the following linear regression model creation, its evaluation against your data set, and running predictions using the trained model.

Create Model

  1. #standardSQL
  2. CREATE OR REPLACE MODEL stocker.price_model
  3. OPTIONS
  4. (model_type='linear_reg', input_label_cols=['price']) AS
  5. SELECT
  6. p.price,
  7. p.closingPrice as prev_price,
  8. c.symbol,
  9. c.magnitude * c.score as sentiment,
  10. CAST(c.retweet AS INT64) as retweet
  11. FROM stocker.content c
  12. JOIN stocker.price p on c.symbol = p.symbol
  13. AND FORMAT_TIMESTAMP('%Y-%m-%d', c.created) = FORMAT_TIMESTAMP('%Y-%m-%d', p.quotedAt)
  14. WHERE c.score <> 0
  15. AND RAND() < 0.01

results in

  1. This statement created a new model named stocker.price_model

Evaluate model

  1. #standardSQL
  2. INSERT stocker.price_mode_eval (
  3. eval_ts,
  4. mean_absolute_error,
  5. mean_squared_error,
  6. mean_squared_log_error,
  7. median_absolute_error,
  8. r2_score,
  9. explained_variance
  10. ) WITH T AS (
  11. SELECT
  12. *
  13. FROM
  14. ML.EVALUATE(MODEL stocker.price_model,(
  15. SELECT
  16. p.price,
  17. p.closingPrice as prev_price,
  18. c.symbol,
  19. c.magnitude * c.score as sentiment,
  20. CAST(c.retweet AS INT64) as retweet
  21. FROM stocker.content c
  22. JOIN stocker.price p on c.symbol = p.symbol
  23. AND FORMAT_TIMESTAMP('%Y-%m-%d', c.created) = FORMAT_TIMESTAMP('%Y-%m-%d', p.quotedAt)
  24. WHERE c.score <> 0
  25. ))
  26. )
  27. SELECT
  28. CURRENT_TIMESTAMP(),
  29. mean_absolute_error,
  30. mean_squared_error,
  31. mean_squared_log_error,
  32. median_absolute_error,
  33. r2_score,
  34. explained_variance
  35. FROM T

results in

  1. mean_absolute_error mean_squared_error mean_squared_log_error median_absolute_error r2_score explained_variance
  2. 3.2502161606238453 227.0738450661901 0.008387276788977339 0.12880176496196327 0.9990422574648288 0.999079865551752

The R2 score is a statistical measure that determines if the linear regression predictions approximate the actual data. 0 indicates that the model explains none of the variability of the response data around the mean. 1 indicates that the model explains all the variability of the response data around the mean.

Use your model to predict stock price

  1. #standardSQL
  2. INSERT stocker.price_prediction (
  3. symbol,
  4. prediction_date,
  5. after_closing_price,
  6. predicted_price
  7. ) WITH T AS (
  8. SELECT
  9. dt.symbol as symbol,
  10. p.closingPrice as after_closing_price,
  11. ROUND(AVG(dt.predicted_price),2) as predicted_price
  12. FROM
  13. ML.PREDICT(MODEL stocker.price_model,
  14. (
  15. SELECT
  16. p.price,
  17. p.closingPrice as prev_price,
  18. c.symbol,
  19. c.magnitude * c.score as sentiment,
  20. CAST(c.retweet AS INT64) as retweet
  21. FROM stocker.content c
  22. JOIN stocker.price p on c.symbol = p.symbol
  23. AND FORMAT_TIMESTAMP('%Y-%m-%d', c.created) = FORMAT_TIMESTAMP('%Y-%m-%d', p.quotedAt)
  24. )) dt
  25. join stocker.price p on p.symbol = dt.symbol
  26. where p.closingDate = FORMAT_TIMESTAMP('%Y-%m-%d', CURRENT_TIMESTAMP(), "America/Los_Angeles")
  27. group by
  28. dt.symbol,
  29. p.closingPrice
  30. )
  31. SELECT
  32. symbol,
  33. FORMAT_TIMESTAMP('%Y-%m-%d', CURRENT_TIMESTAMP(), "America/Los_Angeles"),
  34. after_closing_price,
  35. predicted_price
  36. FROM T

Predictions

  1. SELECT
  2. symbol,
  3. prediction_date,
  4. after_closing_price,
  5. predicted_price
  6. FROM stocker.price_prediction
  7. group by
  8. symbol,
  9. prediction_date,
  10. after_closing_price,
  11. predicted_price
  12. order by 1, 2