注册
登录
新闻动态
其他科技
返回
数据团队的产品指标
作者:
糖果
发布时间:
2024-03-13 02:20:51 (16天前)
来源:
supercooldata.com/
Usually data teams are the ones helping other teams build their dashboards, but what metrics should data teams be using to measure their own performance? 通常,数据团队是帮助其他团队构建仪表板的团队,但是数据团队应使用哪些指标来衡量自身的绩效呢? In this post I’m going to go over a common product analytics framework and how it can be applied to data teams who are trying to democratise access to their data. The analogy that data teams should do things the way product teams do them isn’t new. In a blog post by Shopify, Lin Taylor goes through how to make dashboards using a product thinking approach. During Coalesce 2020, Emilie Schario and Taylor A. Murphy proposed that you should run your data team as a product team. If we’re going to treat the data and insights from the data warehouse as a product, then we’re going to need to monitor usage and measure success with the same rigour as product teams. We can then easily measure the impact of initiatives to drive data engagement across the company including data training such as supercooldata, communicating data team updates and improving metadata discovery. 在本文中,我将介绍一个通用的产品分析框架,以及如何将其应用于试图使对数据的访问民主化的数据团队。 数据团队应该像产品团队那样做事情的比喻并不是什么新鲜事。在Shopify的博客文章中,Lin Taylor介绍了如何使用产品思维方法制作仪表板。在Coalesce 2020年期间,Emilie Schario和Taylor A. Murphy建议您将数据团队作为产品团队运行。 如果我们要将数据仓库中的数据和见解视为产品,那么我们将需要与产品团队一样严格地监控使用情况并衡量成功与否。 然后,我们可以轻松地衡量各种举措的影响,以推动整个公司的数据参与,包括数据培训(如supercooldata),交流数据团队更新并改善元数据发现。 Getting started We will focus on SQL analytics users mostly but the same techniques can be applied to users of your BI tools. To get started, you should at the very least measure how many Weekly Active Users (WAU) you have querying the data warehouse after removing scheduled queries. In Snowflake, this can be calculated using the query history table which has one row for every query executed: select date_trunc('week', date(start_time)) as reporting_week, count(distinct user_name) as active_users from snowflake.account_usage.query_history group by reporting_week; Most other data warehouses have a similar table or view that logs queries executed. An alternative is to take a 14-day rolling average of your daily active users similar to how Airbnb monitors the outcome of their data education efforts: Airbnb Data University metricsSource: How Airbnb is boosting data literacy with data u intensive training 入门 我们将主要侧重于SQL分析用户,但是相同的技术也可以应用于BI工具的用户。 首先,您至少应该测量除去计划的查询后,已查询多少个每周活动用户(WAU)。 在Snowflake中,可以使用查询历史记录表进行计算,该表对于执行的每个查询都有一行: select date_trunc('week', date(start_time)) as reporting_week, count(distinct user_name) as active_users from snowflake.account_usage.query_history group by reporting_week; 大多数其他数据仓库都具有类似的表或视图,用于记录执行的查询。 一种替代方法是对您的日常活动用户进行14天的滚动平均计算,类似于Airbnb监控其数据教育工作成果的方式: Airbnb数据大学指标资料来源:Airbnb如何通过数据密集培训提高数据素养 Growth Accounting User Growth Accounting is a product analytics framework that fits nicely for the type of usage you might see from a data warehouse in your company. The two main equations in growth accounting are shown below: Weekly Active Users (WAU) = New Users + Engaged Users + Reactivated Users WAU Change = New Users + Reactivated Users - Churned Users The process each week is illustrated by this diagram: growth accounting diagram Each user segment is defined based on whether they were active this week and the previous week: growth accounting segments To implement the growth accounting framework in SQL, we need to produce a table similar to the one shown below with one row for every week after a users acquisition week. growth accounting source table We can then run the query below to determine how many users in each segment each week. select reporting_week, sum(case when acquisition_week = reporting_week then 1 else 0 end) as new_users, sum(case when active_status = 'Active' then 1 else 0 end) as active_users, sum(case when active_status = 'Inactive' then 1 else 0 end) as inactive_users, sum(case when active_status = 'Active' and previous_active_status = 'Active' then 1 else 0 end) as engaged_users, sum(case when active_status = 'Inactive' and previous_active_status = 'Inactive' then 1 else 0 end) as unengaged_users, sum(case when active_status = 'Inactive' and previous_active_status = 'Active' then 1 else 0 end) as churned_users, sum(case when active_status = 'Active' and previous_active_status = 'Inactive' then 1 else 0 end) as reactivated_users from weekly_user group by reporting_week order by reporting_week desc; We can also keep a list of all users and tag them by their segment to understand the current distribution. 成长会计 用户增长核算是一个产品分析框架,非常适合您可能会从公司的数据仓库中看到的使用类型。 增长核算中的两个主要方程如下所示: 每周活跃用户(WAU)=新用户+活跃用户+重新激活的用户 WAU更改=新用户+重新激活的用户-流失的用户 下图说明了每周的过程: 增长核算图 根据每个用户段在本周和上周是否处于活动状态进行定义: 增长会计分部 要在SQL中实现增长核算框架,我们需要生成一张与以下所示类似的表,并在获得用户一周之后每周有一行。 增长核算来源表 然后,我们可以运行以下查询来确定每周每个细分中有多少用户。 select reporting_week, sum(case when acquisition_week = reporting_week then 1 else 0 end) as new_users, sum(case when active_status = 'Active' then 1 else 0 end) as active_users, sum(case when active_status = 'Inactive' then 1 else 0 end) as inactive_users, sum(case when active_status = 'Active' and previous_active_status = 'Active' then 1 else 0 end) as engaged_users, sum(case when active_status = 'Inactive' and previous_active_status = 'Inactive' then 1 else 0 end) as unengaged_users, sum(case when active_status = 'Inactive' and previous_active_status = 'Active' then 1 else 0 end) as churned_users, sum(case when active_status = 'Active' and previous_active_status = 'Inactive' then 1 else 0 end) as reactivated_users from weekly_user group by reporting_week order by reporting_week desc; 我们还可以保留所有用户的列表,并按其细分标记他们,以了解当前的分布。 Employee engagement drivers By splitting up our data warehouse users into these separate segments we can not only monitor growth in usage of our data warehouse and dashboards but also develop an understanding of what’s driving that growth. Did usage have a temporary bump due to that one-off SQL analytics workshop we ran last week? Are new employees being appropriately on-boarded to the company’s data and actually being given the access they need? Is the churn rate increasing or do we just have less new employees? 员工敬业度驱动因素 通过将数据仓库用户划分为这些不同的细分,我们不仅可以监控数据仓库和仪表板使用量的增长,还可以了解推动增长的因素。 上周我们举办了一次SQL分析研讨会,因此使用量是否有暂时性增长? 新员工是否已适当地加入公司数据并实际上获得了他们所需的访问权限? 流失率在增加吗?还是我们的新员工减少了? Taking action Monitoring the usage of these segments separately also allows us to implement actions unique for each user group: Our new users aren’t engaged, maybe we can share useful queries with new employees to get them started quickly? For churned users, we might be able to prioritise something that they requested which hasn’t yet been actioned. How do we get employees unengaged for more than 2 weeks interested in using data again? Team Alignment Implementing growth accounting metrics as part of your own teams KPI’s is a great way to keep team members aligned about what’s important. The number of dashboards your team outputs each week is meaningless if no one if using them. Building data models and ingestion pipelines isn’t the only primary activity of a data team. By focusing the teams efforts on growing employee engagement with your data, you quickly realise that your time can be better spent on communicating updates with other teams, running data education workshops or building an internal knowledge base to help employees find and use their data. Keep it simple A few different metrics have been discussed as part of growth accounting, but it’s not intended that you implement all of them from day one. Only use what’s most relevant to you based on what initiatives you’re running to drive data democratisation or the level of data maturity of your organisation. If you’re running weekly data sessions to introduce new employees to the data warehouse and existing employees to SQL analytics, then you probably want some form of cohort retention analysis. If you’re just starting out and arent monitoring anything, just start with learning how many weekly active users you have. If you’re more mature in your data democratisation efforts, then it doesn’t really make much sense to look at cohort retention analysis as there won’t be too many new users. Using a growth accounting framework for monitoring employee engagement with your data might make more sense. Vanity metrics With any product, there are always vanity metrics which sound good but don’t really tell you much about the health of your product. This applies equally to data usage by employees. It’s better to keep your metrics focused on users rather than aggregates of events which can often be very large in magnitude but quite meaningless. Bad: Total report views Total queries executed Total report edits Good: Percent/number of users viewed a report Percent/number of users executed a unique/new query Percent/number of users edited a report Keep it focused It’s best not to mix the dashboards monitoring the health of your data pipelines with those monitoring employee engagement with your data. Even though both of these dashboards might use the same data sources, they have completely different purposes and intended audiences. The audience for data pipeline dashboards are your data engineers, analysts and data scientists. The audience for your data democratisation dashboards could be executive leaders, stakeholders and consumers of your teams output. Keeping the two separate means that non-technical users won’t be confused by technical data jargon related to your pipelines and data warehouse metadata. Keep it open Try to keep your data democratisation dashboards as open as possible for other teams or senior stakeholders to view. This means you probably shouldn’t include any private information like user names of churned users. You might want to have two separate dashboards, one for anyone in the organisation to view and one for your team with more details. Implementation notes: Metric frequency It doesn’t make much sense to look at daily metrics and so ideally you would base your measurement on a weekly frequency. Depending on the size of your company and volume of usage, a monthly frequency might be more appropriate. Medium or large companies This approach generally applies to larger companies with at least 100 potential SQL analytics users across the organisation. Not all data initiatives These metrics mostly relate to the success of data democratisation within your company and don’t really include the value from machine learning initiatives or value of the data warehouse to building new product features. Scheduled queries It goes without saying that you should remove scheduled SQL queries from the logs before starting the analysis. If you can’t separate automated from manually submitted queries, then your team has bigger issues and it’s best to solve this problem first! BI tool metadata Different BI tools provide access to usage metadata in different ways. Methods include: a transactional database that might be accessible, a poorly documented API, SQL query tagging or none at all. It’s usually easiest to start with direct SQL analytics users from the data warehouse logs. When you need more information to cover more aspects of data democratisation, investigate how you can extract metadata from your BI tool. Once you have the BI metadata, then you can implement a similar framework to measure employee engagement with your visualisation platform. Final thoughts Building dashboards, reports and data assets is very similar to building a product. As data teams, we should be at least as data-driven as other teams in monitoring our measures of success and setting goals. By taking a product minded approach to monitoring the success of your teams work, you change the thought process about what actions you need to take to drive success. Maybe your team doesn’t need any new dashboards this quarter and they should focus on the acquisition of new users to the BI platform or data warehouse? If your users are churning, you need to find out why, perhaps their reports aren’t relevant anymore and need an update? Perhaps engaged users only monitor a single report, maybe they don’t know about the other five that already exist which are relevant to them. This blog post started from a discussion we had internally regarding how we should measure the success of data literacy initiatives like supercooldata. We realised pretty quickly that this was a solved problem and product managers have built great frameworks to help monitor, understand and drive engagement with their products. We should use the same frameworks to measure the success and value-add of our own product. 采取行动 分别监视这些细分的使用情况还使我们能够为每个用户组实施唯一的操作: 我们的新用户并没有参与进来,也许我们可以与新员工共享有用的查询,以使他们迅速上手? 对于搅动的用户,我们可以优先考虑尚未请求的请求。 我们如何使员工脱离工作超过2周的时间对再次使用数据感兴趣? 团队结盟 在您自己的团队中实施增长会计指标KPI是使团队成员就重要问题保持一致的一种好方法。 您的团队每周输出的仪表板数量(如果没有人使用的话)是没有意义的。 建立数据模型和获取管道不是数据团队的唯一主要活动。通过将团队的工作重点放在增加员工对数据的参与上,您很快就会意识到,您可以将更多的时间花在与其他团队沟通更新,举办数据教育研讨会或建立内部知识库以帮助员工查找和使用数据上。 把事情简单化 作为增长核算的一部分,已经讨论了一些不同的指标,但是并不打算您从第一天开始就实现所有这些指标。仅根据您正在采取哪些措施来推动数据民主化或组织的数据成熟度,使用与您最相关的内容。 如果您每周运行一次数据会话以将新员工介绍给数据仓库,并将现有员工介绍给SQL分析,那么您可能需要某种形式的同类群组保留分析。 如果您刚开始并没有监视任何内容,那么首先要了解您每周有多少活跃用户。 如果您在数据民主化方面更加成熟,那么查看同类群组保留分析就没有多大意义,因为不会有太多的新用户。使用增长核算框架来监视员工对您数据的参与可能更有意义。 虚荣指标 对于任何产品,总会有虚荣感指标听起来不错,但实际上并不能告诉您有关产品健康状况的太多信息。这同样适用于员工使用数据。 最好将指标集中在用户身上,而不是事件汇总,事件汇总通常可能非常大,但是却毫无意义。 坏的: 报告总数 执行的查询总数 总计报告编辑 好的: 查看报告的用户百分比/数量 执行唯一/新查询的用户百分比/数量 编辑报告的用户百分比/数量 保持专注 最好不要将监视数据管道运行状况的仪表板与监视员工对数据的参与度的仪表板混合使用。 尽管这两个仪表板可能使用相同的数据源,但它们的目的和目标受众是完全不同的。 数据管道仪表板的受众是您的数据工程师,分析师和数据科学家。 数据民主化仪表板的受众可能是团队输出的执行领导,利益相关者和消费者。 将两者分开保存意味着非技术用户不会被与您的管道和数据仓库元数据有关的技术数据术语所迷惑。 保持打开状态 尝试使您的数据民主化仪表板尽可能保持打开状态,以供其他团队或高级利益相关者查看。这意味着您可能不应该包括任何私人信息,例如搅动用户的用户名。您可能需要两个单独的仪表板,一个供组织中的任何人查看,另一个供您的团队使用,以获取更多详细信息。 实施说明: 公制频率 查看每日指标并没有多大意义,因此理想情况下,您应该以每周一次的频率为基础进行测量。根据您公司的规模和使用量,每月一次的频率可能更合适。 中型或大型公司 这种方法通常适用于在整个组织中具有至少100个潜在SQL分析用户的大型公司。 并非所有数据计划 这些指标主要与公司内部数据民主化的成功有关,而实际上并没有包括机器学习计划的价值或数据仓库的价值以及构建新产品功能的价值。 预定查询 不用说,您应该在开始分析之前从日志中删除计划的SQL查询。如果您无法将自动查询与手动提交的查询分开,那么您的团队会遇到更大的问题,最好先解决此问题! BI工具元数据 不同的BI工具以不同的方式提供对使用情况元数据的访问。方法包括:可能访问的事务数据库,文档记录不完善的API,SQL查询标记或根本没有。 从数据仓库日志中直接SQL直接分析用户开始通常是最容易的。当您需要更多信息来涵盖数据民主化的更多方面时,请研究如何从BI工具中提取元数据。 拥有BI元数据后,便可以实现一个类似的框架来衡量员工对可视化平台的参与度。 最后的想法 构建仪表盘,报表和数据资产与构建产品非常相似。作为数据团队,我们至少应像其他团队一样以数据驱动,以监控我们的成功衡量标准和设定目标。 通过采取以产品为中心的方法来监视团队工作的成功,您可以更改思考过程以了解需要采取哪些行动来推动成功。 也许您的团队在本季度不需要任何新的仪表板,他们应该专注于为BI平台或数据仓库吸引新用户? 如果您的用户不满,您需要找出原因,也许他们的报告不再重要,需要更新吗? 参与的用户也许只监视一个报告,也许他们不知道与他们相关的其他五个报告。 这篇博客文章从我们内部进行的讨论开始,该讨论是关于我们应该如何衡量诸如Supercooldata之类的数据素养计划的成功与否。 我们很快意识到这是一个已解决的问题,产品经理已经建立了出色的框架来帮助监视,了解和推动与其产品的互动。我们应该使用相同的框架来衡量我们自己产品的成功和增值。
收藏
举报
1 条回复
动动手指,沙发就是你的了!
登录
后才能参与评论