Survey: Evaluating AI-Generated Code (Lora CC)

Thank you for participating in this study to evaluate the quality of AI-generated code. Your expertise is crucial in helping us understand the performance of different code generation models.


Your Task: You will be presented with 10 code completion tasks. For each task, you will see:


1. A Code Context, which shows a snippet of code and indicates the position where a completion is needed.

2. The Groud Truth and three anonymous Code Suggestions (A, B, C) generated by different AI models to fill in that position.


Please use the following definitions to score each code suggestion on a scale of 0 to 2.


Criterion 1: Correctness (in context) (Does the completion follow the preceding code in a syntactically and logically valid manner?)

- 0 points: The completion is invalid. It results in a syntax error, a type mismatch, or is complete nonsense in the given context.

- 1 point: The completion is plausible but flawed. It might be syntactically correct but contains a minor logical error or is semantically inconsistent with the context.

- 2 points: The completion is perfectly valid. It is syntactically correct, respects type constraints, and logically follows the program's flow.

Criterion 2: Maintainability (of the suggested code) (Is the suggested code fragment itself clear, idiomatic, and easy to understand?)

- 0 points: The suggested code is obfuscated or hard to read. It is unnecessarily complex or uses a confusing style.

- 1 point: The suggested code is functional but could be clearer. It might be slightly convoluted or use non-standard formatting where a better alternative exists.

- 2 points: The suggested code is highly readable and idiomatic. It uses the clearest and most standard way to express the intended logic.


(Note that the positions to be completed have been marked as <--- COMPLETION_HERE --->)

Task 1: 

Code Context: 

package net.bioclipse.opentox.ds;

import java.util.List;

import org.eclipse.ui.plugin.AbstractUIPlugin;

import org.osgi.framework.BundleContext;

public class Activator extends AbstractUIPlugin{

public static final String PLUGIN_ID = " < STR_LIT > ";

private static Activator <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    plugin ;

② Code Solution A:

    plugin ;

③ Code Solution B:

    Registry registry = ActivatorRegistry . getRegistry ( ) ;

④ Code Solution C:

    ( )

1. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
2. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 2: 

Code Context: 

package org.oddjob.framework;

import java.util.concurrent.Callable;

public class CallableProxyGenerator extends ProxyGenerator < Callable < <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    ? > >

② Code Solution A:

    ? > >

③ Code Solution B:

    Object >

④ Code Solution C:

    String >

3. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
4. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 3: 

Code Context: 

package com.asakusafw.utils.java.parser.javadoc;

import java.util.Arrays;

import java.util.Collections;

import java.util.HashSet;

import java.util.Set;

public abstract class AcceptableJavadocBlockParser extends JavadocBlockParser{

        private Set < String > acceptable;

               <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    public AcceptableJavadocBlockParser ( )

② Code Solution A:

    public AcceptableJavadocBlockParser ( )

③ Code Solution B:

    private Set < String > accepted ;

④ Code Solution C:

    private Set < String > accepted ;

5. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
6. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 4: 

Code Context: 

package de.fuberlin.wiwiss.d2rq.mapgen;

public class FilterIncludeExclude extends Filter{

private final Filter include;

private final Filter exclude;

public FilterIncludeExclude(Filter include,Filter exclude){

this.include = include;

this.exclude = exclude;

}

public boolean matchesSchema(String schema){

return include.matchesSchema(schema)&& ! exclude.matchesSchema(schema);

}

public boolean matchesTable(String schema,String table){

return include.matchesTable(schema,table)&& ! exclude.matchesTable(schema,table);

}

public boolean matchesColumn(String schema,String table,String column){

return include.matchesColumn(schema,table <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    , column ) && ! exclude . matchesColumn ( schema , table , column ) ;

② Code Solution A:

    , column ) && ! exclude . matchesColumn ( schema , table , column ) ;

③ Code Solution B:

   , column ) ;

④ Code Solution C:

    , column ) ;

7. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
8. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 5: 

Code Context: 

package uk.me.sample.android.confcaller;

import android.<--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    os . Bundle ;

② Code Solution A:

    os . Bundle ;

③ Code Solution B:

    conf . android . confcaller . confcaller . confcaller ;

④ Code Solution C:

    util . Log ;

9. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
10. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 6: 

Code Context: 

package test.modelgen.table.io;

import java.io.IOException;

import javax.annotation.Generated;

import test.modelgen.table.model.ExportTempImportTarget21;

import com.asakusafw.runtime.io.ModelOutput;

import com.asakusafw.runtime.io.RecordEmitter;

@ Generated(" < STR_LIT > ")@ SuppressWarnings(" < STR_LIT:deprecation > ")public final class ExportTempImportTarget21ModelOutput implements ModelOutput < ExportTempImportTarget21 > {

private final RecordEmitter emitter;

public ExportTempImportTarget21ModelOutput( <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    RecordEmitter emitter )

② Code Solution A:

    RecordEmitter emitter )

③ Code Solution B:

    ModelOutput output )

④ Code Solution C:

    ExportTempImportTarget21ModelOutput output )

11. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
12. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 7: 

Code Context: 

package org.oddjob.jmx.client;

import java.io.Serializable;

public class HandlerVersion implements Serializable{

private static final long serialVersionUID = < NUM_LIT > ;

private final int major;

private final int minor;

public HandlerVersion(int major,int minor){

      this.major = major;

      this.minor = minor;

}

public <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    int getMajor ( )

② Code Solution A:

    int getMajor ( )

③ Code Solution B:

    void setHandler ( HandlerVersion version )

④ Code Solution C:

    HandlerVersion ( int major , int minor )

13. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
14. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 8: 

Code Context: 

package de.fuberlin.wiwiss.d2rq.find;

import junit.framework.Test;

import junit.framework.TestSuite;

public class AllTests{ <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    public static Test suite ( )

② Code Solution A:

    public static Test suite ( )

③ Code Solution B:

    public static final long serialVersionUID = <NUM_LIT>;

④ Code Solution C:

    private static final long serialVersionUID = <NUM_LIT:1L>;

15. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
16. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 9: 

Code Context: 

package org.rubypeople.rdt.refactoring.tests.core.nodewrapper;

import junit.framework.TestSuite;

public class TS_NodeWrapper{

      public static TestSuite suite(){

            TestSuite suite = new TestSuite(" < STR_LIT > ");

            suite.addTestSuite(<--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    TC_FieldNodeWrapper . class ) ;

② Code Solution A:

    TS_NodeWrapper . class ) ;

③ Code Solution B:

    "<STR_LIT>" ) ;

④ Code Solution C:

    new TestSuite ( "<STR_LIT>" ) ) ;

17. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
18. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C

Task 10: 

Code Context: 

package org.oddjob.beanbus;

abstract public class BusException extends Exception{

      private <--- COMPLETION_HERE --->


The following are Groud Truth and the outputs produced by different models:

Groud Truth:

    static final long serialVersionUID = <NUM_LIT> ;

② Code Solution A:

    static final long serialVersionUID = <NUM_LIT:1L> ;

③ Code Solution B:

    BusException exception ;

④ Code Solution C:

    static final String ERROR_MESSAGE = "<STR_LIT>" ;

19. Correctness (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
20. Maintainability (Scoring (0, 1, or 2)):
  • 0
  • 1
  • 2
Code Solution A
Code Solution B
Code Solution C
更多问卷 复制此问卷