Evaluation of a Synthetic Data GRC Scorecard\n合成数据 GRC 记分卡评估问卷

Introduction

Welcome! We invite you to participate in a survey regarding a new framework for evaluating Synthetic Data. This research aims to bridge the gap between technical AI metrics and business decision-making. Your responses are anonymous and strictly for academic research.

问卷说明

欢迎参加本次调查！本研究旨在评估一种新的合成数据（Synthetic Data）评估框架。该框架旨在弥合复杂的技术指标与业务决策之间的差距。

您的回答将仅用于学术研究，所有数据均严格保密且匿名。完成问卷约需 3-5 分钟。

Section A – Demographic information

A 部分：基本信息

A1. Gender / 性别
What is your gender?
您的性别是？

Male 男Female 女Prefer not to say 不愿透露

2. A2. Age group / 年龄段
Which age group do you belong to?
您的年龄段是？

18–2425–3435–4445–5455 or above 55 岁及以上

3. A3.Highest education level / 最高学历
What is your highest level of education?
您的最高学历是？

Diploma 文凭Bachelor’s degree 学士Master’s degree 硕士Doctoral degree (PhD/DBA) 博士Other, please specify 其他，请注明：________

4. A4. Current role / position / 当前角色或职位
What best describes your current role or position?
以下哪一项最能描述您当前的身份/职位？

Student 学生Data scientist / machine learning engineer 数据科学家 / 机器学习工程师IT / analytics professional IT / 数据分析从业者Manager / executive (e.g., risk, compliance, operations) 经理 / 高管（如风控、合规、运营等）Academic / researcher 学术人员 / 研究人员Other, please specify 其他，请注明：________

5. A5. Primary domain of work or study / 主要工作或学习领域
What is your primary domain of work or study?
您的主要工作或学习领域是？

Finance / banking / insurance 金融 / 银行 / 保险Healthcare / life sciences 医疗 / 生命科学Public sector / government 公共部门 / 政府Technology / software 科技 / 软件Education / academia 教育 / 学术Other, please specify 其他，请注明：________

Section B – Perceptions of the GRC scorecard

B 部分：对 GRC 记分卡的看法

Glossary of Terms / 关键术语说明

To help you evaluate the scorecard, please refer to the definitions below: (为了帮助您更好地评估记分卡，请参考以下定义：)

1. Core Concepts (核心概念)

Synthetic Data: Artificial data generated by AI that mimics real-world patterns without containing actual personal information. It is often used for privacy protection.
(合成数据：由 AI 算法生成的人工数据。它在统计特征上模仿真实数据，但不包含任何真实个人的敏感信息，因此常用于保护隐私。)
Models (Columns): The AI algorithms used to create the data. The three columns represent three different algorithms (e.g., Statistical vs. Deep Learning).
(模型：指生成数据的不同 AI 算法。记分卡中的三列代表三种不同的算法表现。)

2. Evaluation Dimensions (评估维度)

Quality: How "real" is the data? Measures statistical similarity to the original data. A higher score is better.
(质量：数据有多“真”？衡量合成数据在统计分布和相关性上与真实数据的相似程度。分数越高越好。)
Risk - Fairness: Is there bias? Measures if the data treats different groups (e.g., gender, race) equally. A lower disparity score is better.
(风险 - 公平性：是否存在歧视？衡量数据是否对不同群体（如性别、种族）存在算法偏见。分数越低/差异越小越好。)
Risk - Privacy: Can identities be leaked? Measures the risk of re-identifying real people from the data. A lower score (closer to 0.5) is safer.
(风险 - 隐私：是否会泄露身份？衡量攻击者从数据中还原真实个人身份的可能性。分数越低/越接近 0.5 越安全。)
Sustainability: How green is it? Measures the carbon footprint (CO2) and energy cost of generating the data. A lower number is better.
(可持续性：有多环保？衡量生成数据所消耗的计算资源和产生的碳排放。数值越低越好。)
Utility: How useful is it? Measures if the synthetic data can be used to train accurate AI models for real-world tasks. A higher score is better.
(效用：有多好用？衡量使用该合成数据训练出来的 AI 模型，在真实任务中的预测准确率。分数越高越好。)

3. Color Coding (颜色编码)

🟩 Green: Good / Low Risk (表现良好 / 低风险)
🟨 Amber: Review Required / Medium Risk (需关注 / 中等风险)
🟥 Red: Poor / High Risk (表现差 / 高风险)

Imagine you are a decision-maker (e.g., Manager or Compliance Officer) evaluating whether to approve a new AI project. The technical team has provided the following 'Quality & Risk Scorecard' to explain the synthetic data they want to use. Please review the image below and answer the questions based on how helpful this scorecard is for your decision.

想象您是一位决策者，正在评估是否批准一个新的 AI 项目。技术团队提供了下方的‘质量与风险记分卡’。请查看该图，并根据它对您决策的帮助程度回答问题。

Figure: GRC scorecard

Please indicate how much you agree with the following statements about the synthetic data GRC scorecard.
1 = Strongly disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly agree.

请根据您对该合成数据 GRC 记分卡的看法，选择您对下列陈述的同意程度：
1 = 非常不同意，2 = 不同意，3 = 一般 / 中立，4 = 同意，5 = 非常同意。

6. TV1
The scorecard clearly separates competing goals, e.g., Quality vs. Risk vs. Sustainability.

这张记分卡清晰地区分了相互竞争的目标（例如将“数据质量”与“隐私风险”、“可持续性”分开列示）。

很不满意

很满意

7. TV2
Grouping metrics into "Dimensions" helps me understand the big picture.
将技术指标归类为左侧的四大“维度 (Dimensions)”，有助于我理解 AI 模型的整体表现。

很不满意

很满意

8. TV3
This framework covers critical aspects of "Responsible AI" including Fairness and Sustainability.
该框架涵盖了我所关心的“负责任 AI (Responsible AI)”的关键方面（包括公平性和环保）。

很不满意

很满意

QO1 – Open-ended question（开放题）

Is there any critical dimension missing?
您认为记分卡中是否还缺少任何关键维度的信息？

10. PV1
The RAG color coding allows me to identify high-risk areas instantly without understanding the math.

红/黄/绿 (RAG) 颜色编码让我无需深究数学原理，就能立即识别出高风险区域。

很不满意

很满意

11. PV2
The layout makes it easy to compare the three models and select the best one.
这种布局让我能够轻松地横向比较三个模型，并根据我的需求选出最佳的一个。

很不满意

很满意

12. PV3
The scorecard effectively summarizes complex technical results into a simple format.
记分卡有效地将原本晦涩难懂的技术结果总结成了简单、易读的格式。

很不满意

很满意

13.

QO2 – Open-ended question（开放题）

Do you find the Red/Amber/Green thresholds easy to interpret? Why or why not?
您觉得红/黄/绿 (RAG) 的颜色阈值设置是否容易直观解读？为什么？

14. MV1
This scorecard provides sufficient evidence for me to approve or reject a dataset.
这张记分卡提供了充分的证据，足以支持我批准或拒绝 (Approve/Reject) 在项目中使用某个合成数据集。

很不满意

很满意

15. MV2
Using this standardized format would reduce time for compliance reviews or audits.
使用这种标准化的格式，可以减少合规审查或内部审计所需的时间和精力。

很不满意

很满意

16. MV3
This tool acts as a helpful bridge for communication between technical teams and management.
这个工具是技术团队（数据科学家）与管理层之间沟通的良好桥梁。

很不满意

很满意

17.

QO3 – Open-ended question（开放题）

What is the biggest barrier to using this scorecard in your organization's decision-making process?
在您的组织中推广使用这种记分卡，最大的障碍可能是什么（例如：习惯、流程、成本）？

18. SV1
Explicitly showing a "Fairness" score makes me feel more confident that the AI is ethical.
明确展示“公平性 (Fairness)”得分（即检查是否有偏见），让我更有信心相信该 AI 符合伦理。

很不满意

很满意

19. SV2
The "Privacy Risk" indicator helps me trust that sensitive data is protected.
“隐私风险 (Privacy Risk)”指标有助于让我相信敏感用户数据得到了有效保护。

很不满意

很满意

20. SV3
Adopting this framework demonstrates a commitment to transparency and Trustworthy AI.

采用这种评估框架，体现了组织对透明度和可信 AI（例如公平性和环保）的承诺。

很不满意

很满意

21.

QO4 – Open-ended question（开放题）

Do you believe this scorecard effectively addresses concerns about AI bias and privacy?
您认为这张记分卡是否有效地回应了人们对 AI 偏见 (Bias) 和 隐私 (Privacy) 的担忧？

22. ENV1
It is important to see "CO2 Emissions" displayed alongside performance metrics.
在记将“碳排放 (CO2 Emissions)”和能源成本与性能指标并列展示，是很有必要的。

很不满意

很满意

23. ENV2
I would choose a slightly lower-performing model if it is significantly "Greener".
如果一个模型显著更环保/更绿色（即在可持续性上显示绿色），我愿意接受其准确率略微降低。

很不满意

很满意

24. ENV3
Reporting computational costs increases the transparency of the AI lifecycle.
报告计算成本（训练时间与碳排放）增加了 AI 开发过程的透明度。

很不满意

很满意

25.

QO5 – Open-ended question（开放题）

Do you think including CO₂ emissions and training time in the scorecard is meaningful? Why or why nWould "CO2 Emissions" actually influence your choice in a real project? Why?

在真实项目中，碳排放数据真的会影响您对模型的选择吗？为什么？

26.

AD1

I would intend to use this "SynTab-GRC" framework to evaluate synthetic data.
假设我有使用合成数据的需求，我会考虑使用这个“SynTab-GRC”框架（或类似工具）来进行评估。

很不满意

很满意

更多问卷复制此问卷