Go's testing is built into the language, no Jest, no pytest, no external framework. When you ask AI to write tests, it produces syntactically correct test files almost every time. The problem is different: AI writes tests that pass but don't catch bugs. Tests that check the happy path with hardcoded expected values. Tests that verify the code does what it does rather than what it should do.
Your skill: evaluating whether AI-generated tests actually test anything useful.
The basics
// calculator_test.go
package calculator
import "testing"
func TestAdd(t *testing.T) {
result := Add(2, 3)
if result != 5 {
t.Errorf("Add(2, 3) = %d; want 5", result)
}
}go test # Run tests in current package
go test ./... # Run all tests in module
go test -v # Verbose output
go test -run TestAdd # Run specific test
go test -cover # Show coverage percentagetesting.T methods
| Method | What it does | When to use |
|---|---|---|
t.Error(msg) | Mark failed, continue | Multiple checks in one test |
t.Errorf(fmt, args) | Formatted failure, continue | Include values in failure message |
t.Fatal(msg) | Mark failed, stop test | Can't continue without this passing |
t.Fatalf(fmt, args) | Formatted failure, stop | Setup failures |
t.Skip(msg) | Skip this test | Missing external dependency |
t.Parallel() | Run concurrently | Independent, slow tests |
t.Helper() | Mark as helper function | Custom assertion functions |
Table-driven tests
This is the Go convention. AI knows it and generates it reliably. The pattern: define test cases as a slice of structs, loop through them with t.Run().
func TestDivide(t *testing.T) {
tests := []struct {
name string
a, b float64
want float64
wantErr bool
}{
{"normal", 10, 2, 5, false},
{"fractional", 5, 2, 2.5, false},
{"negative", -10, 2, -5, false},
{"divide by zero", 10, 0, 0, true},
{"zero numerator", 0, 5, 0, false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := Divide(tt.a, tt.b)
if tt.wantErr {
if err == nil {
t.Error("expected error, got nil")
}
return
}
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if got != tt.want {
t.Errorf("Divide(%v, %v) = %v; want %v",
tt.a, tt.b, got, tt.want)
}
})
}
}Run a specific subtest:
go test -run "TestDivide/divide_by_zero"Testing HTTPWhat is http?The protocol browsers and servers use to exchange web pages, API data, and other resources, defining how requests and responses are formatted. handlers
The httptest package lets you test handlers without starting a real server. This is where AI-generated tests are most deceptive, they verify the handler returns 200 but don't check the response body, headers, or behavior with bad input.
What AI generates
func TestGetUser(t *testing.T) {
req := httptest.NewRequest("GET", "/users/1", nil)
rr := httptest.NewRecorder()
getUser(rr, req)
if rr.Code != http.StatusOK {
t.Errorf("got %d, want 200", rr.Code)
}
}This test passes, but it doesn't verify the response body, doesn't test invalid IDs, doesn't test missing users, and doesn't check Content-Type headers.
What useful tests look like
func TestGetUser(t *testing.T) {
// Setup: seed test data
store := NewMemoryStore()
store.Add(User{ID: 1, Name: "Alice", Email: "alice@test.com"})
handler := NewUserHandler(store)
tests := []struct {
name string
path string
wantStatus int
wantBody string
}{
{"valid user", "/users/1", 200, `"name":"Alice"`},
{"not found", "/users/999", 404, `"error"`},
{"invalid id", "/users/abc", 400, `"Invalid"`},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
req := httptest.NewRequest("GET", tt.path, nil)
rr := httptest.NewRecorder()
handler.ServeHTTP(rr, req)
if rr.Code != tt.wantStatus {
t.Errorf("status = %d; want %d", rr.Code, tt.wantStatus)
}
if tt.wantBody != "" && !strings.Contains(rr.Body.String(), tt.wantBody) {
t.Errorf("body = %s; want to contain %s", rr.Body.String(), tt.wantBody)
}
})
}
}Testing POST handlers with request bodies
func TestCreateUser(t *testing.T) {
tests := []struct {
name string
body string
wantStatus int
}{
{"valid", `{"name":"Bob","email":"bob@test.com"}`, 201},
{"missing name", `{"email":"bob@test.com"}`, 400},
{"missing email", `{"name":"Bob"}`, 400},
{"invalid json", `{not json}`, 400},
{"empty body", ``, 400},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
req := httptest.NewRequest("POST", "/users",
strings.NewReader(tt.body))
req.Header.Set("Content-Type", "application/json")
rr := httptest.NewRecorder()
createUser(rr, req)
if rr.Code != tt.wantStatus {
t.Errorf("status = %d; want %d\nbody: %s",
rr.Code, tt.wantStatus, rr.Body.String())
}
})
}
}Benchmarks
Benchmarks use *testing.B and run the code b.N times. Go automatically adjusts b.N to get stable measurements.
func BenchmarkJSONMarshal(b *testing.B) {
user := User{ID: 1, Name: "Alice", Email: "alice@test.com"}
for i := 0; i < b.N; i++ {
json.Marshal(user)
}
}go test -bench=. # Run all benchmarks
go test -bench=. -benchmem # Include memory allocation stats
go test -bench=. -benchtime=5s # Run for at least 5 secondsOutput:
BenchmarkJSONMarshal-8 5000000 312 ns/op 128 B/op 2 allocs/opComparing approaches
func BenchmarkSliceAppend(b *testing.B) {
b.Run("no preallocation", func(b *testing.B) {
for i := 0; i < b.N; i++ {
var s []int
for j := 0; j < 1000; j++ {
s = append(s, j)
}
}
})
b.Run("preallocated", func(b *testing.B) {
for i := 0; i < b.N; i++ {
s := make([]int, 0, 1000)
for j := 0; j < 1000; j++ {
s = append(s, j)
}
}
})
}b.N loop (measuring setup + operation) or put b.ResetTimer() in the wrong place. The loop for i := 0; i < b.N; i++ should contain ONLY the operation you're measuring.Test helpers
Mark helper functions with t.Helper() so test failures report the caller's line number, not the helper's:
func assertStatusCode(t *testing.T, got, want int) {
t.Helper()
if got != want {
t.Errorf("status code = %d; want %d", got, want)
}
}
func assertContains(t *testing.T, body, substr string) {
t.Helper()
if !strings.Contains(body, substr) {
t.Errorf("body %q does not contain %q", body, substr)
}
}Test quality checklist
When reviewing AI-generated tests, ask:
| Question | Red flag if no |
|---|---|
| Does it test error cases? | Only happy path = useless tests |
| Does it verify response content? | Status-code-only tests miss bugs |
| Does it use test data, not production data? | Flaky, environment-dependent |
| Are test cases named descriptively? | Hard to debug failures |
| Does it test edge cases (empty, nil, max)? | Bugs hide at boundaries |