项目作者: aikuyun

项目描述 :
superset dev 关于 superset 的优化点
高级语言: Python
项目地址: git://github.com/aikuyun/superset.git
创建时间: 2020-06-11T13:58:08Z
项目社区:https://github.com/aikuyun/superset

开源协议:Apache License 2.0

下载


Superset

Build Status
PyPI version
Coverage Status
PyPI
Get on Slack
Documentation
dependencies Status

Superset

A modern, enterprise-ready business intelligence web application.

Why Superset |
Database Support |
Installation and Configuration |
Get Help |
Contributor Guide |
Resources |
Superset Users |
License |

我改了哪些地方?

1. SQL Lab 查看表字段注释

显示表注释

用户希望在左侧筛选表的时候显示字段注释信息,目前只显示了字段名字段类型。具体改造如下:

  • 修改前端,对应文件:superset-frontend/src/SqlLab/components/ColumnElement.jsx,大概 64 行的位置开始

    1. return (
    2. <div className="clearfix table-column">
    3. <div className="pull-left m-l-10 col-name">
    4. {name}
    5. {icons}
    6. (<span className="text-muted">{col.type}</span>)
    7. </div>
    8. <div className="pull-right text-muted">
    9. <small>{col.comment}</small>
    10. </div>
    11. </div>
    12. );
  • 修改后端,对应文件: superset/views/database/api.py 中的 get_table_metadata 方法

    对应 mysql,可以直接拿到 comment 字段,但是对应 hive 或者 presto 拿不到,所以改成 desc table 来取 comment 字段。

  1. def get_table_metadata(
  2. database: Database, table_name: str, schema_name: Optional[str]
  3. ) -> Dict:
  4. """
  5. Get table metadata information, including type, pk, fks.
  6. This function raises SQLAlchemyError when a schema is not found.
  7. :param database: The database model
  8. :param table_name: Table name
  9. :param schema_name: schema name
  10. :return: Dict table metadata ready for API response
  11. """
  12. keys: List = []
  13. columns = database.get_columns(table_name, schema_name)
  14. # define comment dict by tsl
  15. comment_dict = {}
  16. primary_key = database.get_pk_constraint(table_name, schema_name)
  17. if primary_key and primary_key.get("constrained_columns"):
  18. primary_key["column_names"] = primary_key.pop("constrained_columns")
  19. primary_key["type"] = "pk"
  20. keys += [primary_key]
  21. # get dialect name
  22. dialect_name = database.get_dialect().name
  23. if isinstance(dialect_name, bytes):
  24. dialect_name = dialect_name.decode()
  25. # get column comment, presto & hive
  26. if dialect_name == "presto" or dialect_name == "hive":
  27. db_engine_spec = database.db_engine_spec
  28. sql = ParsedQuery("desc {a}.{b}".format(a=schema_name, b=table_name)).stripped()
  29. engine = database.get_sqla_engine(schema_name)
  30. conn = engine.raw_connection()
  31. cursor = conn.cursor()
  32. query = Query()
  33. session = Session(bind=engine)
  34. query.executed_sql = sql
  35. query.__tablename__ = table_name
  36. session.commit()
  37. db_engine_spec.execute(cursor, sql, async_=False)
  38. data = db_engine_spec.fetch_data(cursor, query.limit)
  39. # parse list data into dict by tsl; hive and presto is different
  40. if dialect_name == "presto":
  41. for d in data:
  42. d[3]
  43. comment_dict[d[0]] = d[3]
  44. else:
  45. for d in data:
  46. d[2]
  47. comment_dict[d[0]] = d[2]
  48. conn.commit()
  49. foreign_keys = get_foreign_keys_metadata(database, table_name, schema_name)
  50. indexes = get_indexes_metadata(database, table_name, schema_name)
  51. keys += foreign_keys + indexes
  52. payload_columns: List[Dict] = []
  53. for col in columns:
  54. dtype = get_col_type(col)
  55. if len(comment_dict) > 0:
  56. payload_columns.append(
  57. {
  58. "name": col["name"],
  59. "type": dtype.split("(")[0] if "(" in dtype else dtype,
  60. "longType": dtype,
  61. "keys": [k for k in keys if col["name"] in k.get("column_names")],
  62. "comment": comment_dict[col["name"]],
  63. }
  64. )
  65. elif dialect_name == "mysql":
  66. payload_columns.append(
  67. {
  68. "name": col["name"],
  69. "type": dtype.split("(")[0] if "(" in dtype else dtype,
  70. "longType": dtype,
  71. "keys": [k for k in keys if col["name"] in k.get("column_names")],
  72. "comment": col["comment"],
  73. }
  74. )
  75. else:
  76. payload_columns.append(
  77. {
  78. "name": col["name"],
  79. "type": dtype.split("(")[0] if "(" in dtype else dtype,
  80. "longType": dtype,
  81. "keys": [k for k in keys if col["name"] in k.get("column_names")],
  82. # "comment": col["comment"],
  83. }
  84. )
  85. return {
  86. "name": table_name,
  87. "columns": payload_columns,
  88. "selectStar": database.select_star(
  89. table_name,
  90. schema=schema_name,
  91. show_cols=True,
  92. indent=True,
  93. cols=columns,
  94. latest_partition=True,
  95. ),
  96. "primaryKey": primary_key,
  97. "foreignKeys": foreign_keys,
  98. "indexes": keys,
  99. }

修改过时方法

后台有一个报错信息:AttributeError: ‘DataFrame’ object has no attribute ‘ix‘,原因是 pandas 在 1.0.0 之后,移除了 ix 方法

文件 superset/db_engine_specs/hive.py 中的 _latest_partition_from_df 方法

  1. @classmethod
  2. def _latest_partition_from_df(cls, df: pd.DataFrame) -> Optional[List[str]]:
  3. """Hive partitions look like ds={partition name}"""
  4. if not df.empty:
  5. return [df.iloc[:, 0].max().split("=")[1]]
  6. return None