Survey: Evaluating AI-Generated Code (Full CC)

Thank you for participating in this study to evaluate the quality of AI-generated code. Your expertise is crucial in helping us understand the performance of different code generation models.

Your Task: You will be presented with 10 code completion tasks. For each task, you will see:

1. A Code Context, which shows a snippet of code and indicates the position where a completion is needed.

2. The Groud Truth and three anonymous Code Suggestions (A, B, C) generated by different AI models to fill in that position.

Please use the following definitions to score each code suggestion on a scale of 0 to 2.

Criterion 1: Correctness (in context) (Does the completion follow the preceding code in a syntactically and logically valid manner?)

- 0 points: The completion is invalid. It results in a syntax error, a type mismatch, or is complete nonsense in the given context.
- 1 point: The completion is plausible but flawed. It might be syntactically correct but contains a minor logical error or is semantically inconsistent with the context.
- 2 points: The completion is perfectly valid. It is syntactically correct, respects type constraints, and logically follows the program's flow.

Criterion 2: Maintainability (of the suggested code) (Is the suggested code fragment itself clear, idiomatic, and easy to understand?)

- 0 points: The suggested code is obfuscated or hard to read. It is unnecessarily complex or uses a confusing style.
- 1 point: The suggested code is functional but could be clearer. It might be slightly convoluted or use non-standard formatting where a better alternative exists.
- 2 points: The suggested code is highly readable and idiomatic. It uses the clearest and most standard way to express the intended logic.

(Note that the positions to be completed have been marked as <--- COMPLETION_HERE --->)

Task 1:

Code Context:

package org.rubypeople.rdt.internal.debug.core.parsing;
import org.rubypeople.rdt.debug.core.RdtDebugCorePlugin;
import org.xmlpull.<--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

v1 . XmlPullParser ;

② Code Solution A：

v1 . XmlPullParser ;

③ Code Solution B：

core . util . R ;

④ Code Solution C：

core . util . Util ;

1. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

2. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 2:

Code Context:

package net.sf.sveditor.core.db.stmt;
import net.sf.sveditor.core.db.SVDBItemType;
import net.sf.sveditor.core.db.expr.SVDBExpr;
public class SVDBWaitStmt extends SVDBBodyStmt{
public SVDBExpr fExpr;
public SVDBWaitStmt(){
this(SVDBItemType.WaitStmt);
}
protected SVDBWaitStmt(<--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

SVDBItemType type )

② Code Solution A：

SVDBItemType type )

③ Code Solution B：

)

④ Code Solution C：

SVDBExpr fExpr )

3. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

4. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 3:

Code Context:

package de.fuberlin.wiwiss.d2rq.d2rq_sdb;
import junit.framework.Test;
import <--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

junit . framework . TestSuite ;

② Code Solution A：

junit . framework . TestSuite ;

③ Code Solution B：

de . fuberlin . wiwiss . d2rq . d2rq_sdb ;

④ Code Solution C：

de . fuberlin . wiwiss . d2rq . d2rq_sdb . D2RQ ;

5. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

6. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 4:

Code Context:

package net.sf.sveditor.core.db;
import java.util.List;
import net.sf.sveditor.core.db.stmt.SVDBVarDimItem;
public class SVDBTypeInfoBuiltin extends SVDBTypeInfo{
public static final int TypeAttr_Signed = ( < NUM_LIT:1 > < < < NUM_LIT:7 > );
public static final int TypeAttr_Unsigned = ( < NUM_LIT:1 > < < < NUM_LIT:8 > );
public int fAttr;
public List < SVDBVarDimItem > fVectorDim;
public SVDBTypeInfoBuiltin(){
this(" < STR_LIT > ");
}
public SVDBTypeInfoBuiltin(String typename){
super(typename,SVDBItemType.TypeInfoBuiltin);
}
public SVDBTypeInfoBuiltin(String typename,SVDBItemType type){
super(typename,type);
}
public int getAttr(){
return <--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

fAttr ;

② Code Solution A：

fAttr ;

③ Code Solution B：

<NUM_LIT:0> ;

④ Code Solution C：

TypeAttr_Signed ;

7. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

8. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 5:

Code Context:

package org.rubypeople.rdt.internal.ui.actions;
import org.eclipse.jface.action.Action;
import org.eclipse.jface.util.Assert;
import org.eclipse.jface.viewers.TableViewer;
import org.eclipse.ui.PlatformUI;
import org.rubypeople.rdt.internal.ui.IRubyHelpContextIds;
public class SelectAllAction extends Action{
private TableViewer fViewer;
public SelectAllAction(TableViewer viewer){
super(" < STR_LIT > ");
setText(ActionMessages.SelectAllAction_label);
setToolTipText<--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

( ActionMessages . SelectAllAction_tooltip ) ;

② Code Solution A：

( ActionMessages . SelectAllAction_toolTipText ) ;

③ Code Solution B：

( "<STR_LIT>" ) ;

④ Code Solution C：

( "<STR_LIT>" ) ;

9. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

10. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 6:

Code Context:

package fi.koku.services.entity.customerservice.model;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Message{
private static Logger log = LoggerFactory.getLogger(Message.class);
private String id;
private String from;
private String role;
private String text;
private boolean twoParentsInFamily;
private String memberToAddPic;
public Message(String id,String fromUserPic,String memberToAddPic,String role,String text,boolean twoParentsInFamily){
log.debug(" < STR_LIT > ");
log.debug(" < STR_LIT > " + id);
log.debug(" < STR_LIT > " + fromUserPic);
log.debug(" < STR_LIT > " + <--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

role ) ;

② Code Solution A：

role ) ;

③ Code Solution B：

memberToAddPic ) ;

④ Code Solution C：

memberToAddPic ) ;

11. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

12. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 7:

Code Context:

package com.asakusafw.dmdl.directio.csv.driver;
import com.asakusafw.dmdl.directio.csv.driver.CsvFieldTrait.Kind;
import com.asakusafw.dmdl.model.AstAttribute;
import com.asakusafw.dmdl.model.BasicTypeKind;
import com.asakusafw.dmdl.semantics.DmdlSemantics;
import com.asakusafw.dmdl.semantics.PropertyDeclaration;
import com.asakusafw.dmdl.spi.PropertyAttributeDriver;
import com.asakusafw.dmdl.util.<--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

AttributeUtil ;

② Code Solution A：

AttributeUtils ;

③ Code Solution B：

SuffixConstants ;

④ Code Solution C：

DmdlUtils ;

13. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

14. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 8:

Code Context:

package net.ggtools.grand.ui.actions;
import net.ggtools.grand.filters.GraphFilter;
import net.ggtools.grand.filters.MissingNodeFilter;
import net.ggtools.grand.ui.graph.GraphControlerProvider;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
public class FilterMissingNodesAction extends GraphControlerAction{
private static final Log log = LogFactory.getLog(FilterMissingNodesAction.class);
private static <--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

final String DEFAULT_ACTION_NAME = "<STR_LIT>" ;

② Code Solution A：

final String DEFAULT_FILTER = "<STR_LIT>" ;

③ Code Solution B：

final Logger log = Logger . getLogger ( FilterMissingNodesAction . class ) ;

④ Code Solution C：

final Logger log = LogFactory . getLog ( FilterMissingNodesAction . class ) ;

15. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

16. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 9:

Code Context:

package com.asakusafw.runtime.directio;
public final class OutputTransactionContext{
private final String transactionId;
private final String outputId;
private final Counter counter;
public OutputTransactionContext(String transactionId,String outputId,Counter counter){
if(transactionId = = null){
throw new IllegalArgumentException(" < STR_LIT > ");
}
if(outputId = = null){
throw new IllegalArgumentException(" < STR_LIT > ");
}
if(counter = = null){
throw new IllegalArgumentException(" < STR_LIT > ");
}
this.transactionId = transactionId;
this.outputId = outputId;
this.counter = counter;
}
public <--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

String getTransactionId ( )

② Code Solution A：

String getTransactionId ( )

③ Code Solution B：

void setTransactionId ( String transactionId )

④ Code Solution C：

void setTransactionId ( String transactionId )

17. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

18. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

Task 10:

Code Context:

package net.ggtools.grand.ui.actions;
import net.ggtools.grand.ui.widgets.GraphWindow;
import net.ggtools.grand.ui.widgets.OpenFileWizard;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.eclipse.jface.action.Action;
import org.eclipse.jface.wizard.IWizard;
import org.eclipse.jface.wizard.WizardDialog;
import org.eclipse.swt.SWT;
public class OpenFileAction extends Action{
private static final Log log = LogFactory.getLog(OpenFileAction.class);
private static final String DEFAULT_ACTION_NAME = " < STR_LIT > ";
private final GraphWindow window;
@ Override public void run(){
final <--- COMPLETION_HERE --->

The following are Groud Truth and the outputs produced by different models：

① Groud Truth:

IWizard wizard = new OpenFileWizard ( window ) ;

② Code Solution A：

IWizard wizard = new IWizard ( window ) ;

③ Code Solution B：

GraphWindow window = new GraphWindow ( ) ;

④ Code Solution C：

GraphWindow currentWindow = new GraphWindow ( ) ;

19. Correctness (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

20. Maintainability (Scoring (0, 1, or 2)):

Code Solution A

Code Solution B

Code Solution C

更多问卷复制此问卷