Introduction
Welcome! We invite you to participate in a survey regarding a new framework for evaluating Synthetic Data. This research aims to bridge the gap between technical AI metrics and business decision-making. Your responses are anonymous and strictly for academic research.
问卷说明
欢迎参加本次调查! 本研究旨在评估一种新的合成数据(Synthetic Data)评估框架 。该框架旨在弥合复杂的技术指标与业务决策之间的差距。
您的回答将仅用于学术研究,所有数据均严格保密且匿名。完成问卷约需 3-5 分钟。
Section A – Demographic information A 部分:基本信息
1. A1. Gender / 性别 What is your gender? 您的性别是?
Male 男 Female 女 Prefer not to say 不愿透露
2. A2. Age group / 年龄段 Which age group do you belong to? 您的年龄段是? 18–24 25–34 35–44 45–54 55 or above 55 岁及以上
3. A3.Highest education level / 最高学历 What is your highest level of education? 您的最高学历是? Diploma 文凭 Bachelor’s degree 学士 Master’s degree 硕士 Doctoral degree (PhD/DBA) 博士 Other, please specify 其他,请注明:________
4. A4. Current role / position / 当前角色或职位 What best describes your current role or position? 以下哪一项最能描述您当前的身份/职位? Student 学生 Data scientist / machine learning engineer 数据科学家 / 机器学习工程师 IT / analytics professional IT / 数据分析从业者 Manager / executive (e.g., risk, compliance, operations) 经理 / 高管(如风控、合规、运营等) Academic / researcher 学术人员 / 研究人员 Other, please specify 其他,请注明:________
5. A5. Primary domain of work or study / 主要工作或学习领域 What is your primary domain of work or study? 您的主要工作或学习领域是? Finance / banking / insurance 金融 / 银行 / 保险 Healthcare / life sciences 医疗 / 生命科学 Public sector / government 公共部门 / 政府 Technology / software 科技 / 软件 Education / academia 教育 / 学术 Other, please specify 其他,请注明:________
Section B – Perceptions of the GRC scorecard B 部分:对 GRC 记分卡的看法
Glossary of Terms / 关键术语说明 To help you evaluate the scorecard, please refer to the definitions below: (为了帮助您更好地评估记分卡,请参考以下定义:)
1. Core Concepts (核心概念)
Synthetic Data: Artificial data generated by AI that mimics real-world patterns without containing actual personal information. It is often used for privacy protection.
(合成数据: 由 AI 算法生成的人工数据。它在统计特征上模仿真实数据,但不包含任何真实个人的敏感信息,因此常用于保护隐私。)
Models (Columns): The AI algorithms used to create the data. The three columns represent three different algorithms (e.g., Statistical vs. Deep Learning).
(模型: 指生成数据的不同 AI 算法。记分卡中的三列代表三种不同的算法表现。)
2. Evaluation Dimensions (评估维度)
Quality: How "real" is the data? Measures statistical similarity to the original data. A higher score is better.
(质量:数据有多“真”? 衡量合成数据在统计分布和相关性上与真实数据的相似程度。分数越高越好。)
Risk - Fairness: Is there bias? Measures if the data treats different groups (e.g., gender, race) equally. A lower disparity score is better.
(风险 - 公平性:是否存在歧视? 衡量数据是否对不同群体(如性别、种族)存在算法偏见。分数越低/差异越小越好。)
Risk - Privacy: Can identities be leaked? Measures the risk of re-identifying real people from the data. A lower score (closer to 0.5) is safer.
(风险 - 隐私: 是否会泄露身份? 衡量攻击者从数据中还原真实个人身份的可能性。分数越低/越接近 0.5 越安全。)
Sustainability: How green is it? Measures the carbon footprint (CO2) and energy cost of generating the data. A lower number is better.
(可持续性: 有多环保?衡量生成数据所消耗的计算资源和产生的碳排放。数值越低越好。)
Utility: How useful is it? Measures if the synthetic data can be used to train accurate AI models for real-world tasks. A higher score is better.
(效用: 有多好用? 衡量使用该合成数据训练出来的 AI 模型,在真实任务中的预测准确率。分数越高越好。)
3. Color Coding (颜色编码)
🟩 Green: Good / Low Risk (表现良好 / 低风险)
🟨 Amber: Review Required / Medium Risk (需关注 / 中等风险)
🟥 Red: Poor / High Risk (表现差 / 高风险)
Imagine you are a decision-maker (e.g., Manager or Compliance Officer) evaluating whether to approve a new AI project . The technical team has provided the following 'Quality & Risk Scorecard' to explain the synthetic data they want to use. Please review the image below and answer the questions based on how helpful this scorecard is for your decision.
想象您是一位决策者 ,正在评估 是否批准 一个新的 AI 项目 。技术团队提供了下方的‘质量与风险记分卡’ 。请查看该图,并根据它对您决策的帮助程度回答问题。
Please indicate how much you agree with the following statements about the synthetic data GRC scorecard. 1 = Strongly disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly agree.
请根据您对该合成数据 GRC 记分卡的看法,选择您对下列陈述的同意程度: 1 = 非常不同意,2 = 不同意,3 = 一般 / 中立,4 = 同意,5 = 非常同意。
6. TV1 The scorecard clearly separates competing goals, e.g., Quality vs. Risk vs. Sustainability.这张记分卡清晰地区分了相互竞争的目标(例如将“数据质量”与“隐私风险”、“可持续性”分开列示)。
7. TV2 Grouping metrics into "Dimensions" helps me understand the big picture. 将技术指标归类为左侧的四大“维度 (Dimensions)”,有助于我理解 AI 模型的整体表现。 8. TV3 This framework covers critical aspects of "Responsible AI" including Fairness and Sustainability. 该框架涵盖了我所关心的“负责任 AI (Responsible AI)”的关键方面(包括公平性和环保)。 9. QO1 – Open-ended question(开放题)
Is there any critical dimension missing? 您认为记分卡中是否还缺少 任何关键维度的信息?
10. PV1 The RAG color coding allows me to identify high-risk areas instantly without understanding the math.红/黄/绿 (RAG) 颜色编码 让我无需深究数学原理,就能立即识别出高风险区域。
11. PV2 The layout makes it easy to compare the three models and select the best one. 这种布局让我能够轻松地横向比较三个模型 ,并根据我的需求选出最佳的一个。 12. PV3 The scorecard effectively summarizes complex technical results into a simple format. 记分卡有效地将原本晦涩难懂的技术结果总结成了简单、易读的格式。 13. QO2 – Open-ended question(开放题)
Do you find the Red/Amber/Green thresholds easy to interpret? Why or why not? 您觉得红/黄/绿 (RAG) 的颜色阈值设置是否容易直观解读?为什么?
14. MV1 This scorecard provides sufficient evidence for me to approve or reject a dataset. 这张记分卡提供了充分的证据,足以支持我批准或拒绝 (Approve/Reject) 在项目中使用某个合成数据集。 15. MV2 Using this standardized format would reduce time for compliance reviews or audits. 使用这种标准化的格式,可以减少合规审查或内部审计所需的时间和精力。 16. MV3 This tool acts as a helpful bridge for communication between technical teams and management. 这个工具是技术团队(数据科学家)与管理层之间沟通的良好桥梁 。 17. QO3 – Open-ended question(开放题)
What is the biggest barrier to using this scorecard in your organization's decision-making process? 在您的组织中推广使用这种记分卡,最大的障碍 可能是什么(例如:习惯、流程、成本)?
18. SV1 Explicitly showing a "Fairness" score makes me feel more confident that the AI is ethical. 明确展示“公平性 (Fairness)” 得分(即检查是否有偏见),让我更有信心相信该 AI 符合伦理。 19. SV2 The "Privacy Risk" indicator helps me trust that sensitive data is protected.“隐私风险 (Privacy Risk)” 指标有助于让我相信敏感用户数据得到了有效保护。 20. SV3 Adopting this framework demonstrates a commitment to transparency and Trustworthy AI.采用这种评估框架,体现了组织对透明度 和可信 AI(例如公平性和环保) 的承诺。
21. QO4 – Open-ended question(开放题)
Do you believe this scorecard effectively addresses concerns about AI bias and privacy? 您认为这张记分卡是否有效地回应了人们对 AI 偏见 (Bias) 和 隐私 (Privacy) 的担忧?
22. ENV1 It is important to see "CO2 Emissions" displayed alongside performance metrics. 在记将“碳排放 (CO2 Emissions)” 和能源成本与性能指标并列展示,是很有必要的。 23. ENV2 I would choose a slightly lower-performing model if it is significantly "Greener". 如果一个模型显著更环保/更绿色 (即在可持续性上显示绿色),我愿意接受其准确率略微降低。 24. ENV3 Reporting computational costs increases the transparency of the AI lifecycle. 报告计算成本(训练时间与碳排放)增加了 AI 开发过程的透明度。 25. QO5 – Open-ended question(开放题)
Do you think including CO₂ emissions and training time in the scorecard is meaningful? Why or why nWould "CO2 Emissions" actually influence your choice in a real project? Why?
在真实项目中,碳排放数据 真的会影响您对模型的选择吗?为什么?
26. AD1
I would intend to use this "SynTab-GRC" framework to evaluate synthetic data. 假设我有使用合成数据的需求,我会考虑使用这个“SynTab-GRC”框架(或类似工具)来进行评估。