Unified input validation

Decision

We use typebox for all input validation.

Problems

We currently use many disparate libraries for validation, depending on the service type, be it monolithic servers, frontend apps, service-oriented CDK apps. It blocks us from reusing the same business logic code in multiple places (e.g. when validating an event that is sent between services). It also leads to friction when using validation for a new feature when devs need to adapt to the different approaches. Sometimes the validation is also performed manually, which requires more effort and is harder to perform in full.

The goal of this task is to look through different validation methods and find the default one for us to use. The solution should be easy enough to use that adding validation early-on is a no-brainer and it should work regardless of whether we validate an endpoint input, customer/form input, a config, or anything else.

Context

We currently have many different ways of validating input. We use yup mostly in admin-dashboard and client-dashboard-2, ajv mostly in hermione and filch, zod in supplier-manager and suretype in our CDK-based services. In many places we handle the validation logic manually (most of the original hermione endpoints and filch jobs), and in some places barely at all (e.g. put-automations-id endpoint).

For choosing the validation library, we folowed this list of requirements:

  1. (checklist) the lib gets through the Choosing dependency ADR checklist (basically, is a non-trivial popular and well-maintained open source library)
  2. (ts-types) infers TS types (or is straight TS types based)
  3. (json-schema) supports JSON schema: either directly based on it, or can be converted into (to be able to use it in ApiGateway or others services that only support JSON schema)
  4. (form-validation) is integrable with form validation, preferably there's an adapter (e.g. for react-hook-form in cd2)
  5. (inference-perf) high dev time performance (doesn't slow down IDE when infering TS types)
  6. (no-framework) uses only straight JS, no additional framework-specific libraries (because validation logic is business logic and therefore should be agnostic of which tech stack it is used in)
  7. (approachable) is easy enough to write so that adding validation early on is a no-brainer (this is kinda subjective, in practice we should avoid more esoteric approaches)
  8. (no-generation) no code generation step required: so we can use the same IDE tooling as for other code we write (though for high-volume validations it may actually worth be looking into)
  9. (runtime-perf) (optional) high runtime performance (we may need to verify each incoming recipient, which might be millions at a time - if necessary, we may go for a different solution in this usecase, but it would still be nice if one lib could handle both situations)
  10. (usable-in-lambda) (optional) can be used in lambda, meaning size is not too big (though for CDK's RestApi we may actually directly use RequestValidator and Model)

Options

Checked on 2024-06-28

| Name | Weekly Downloads | Unpacked Size | Last publish | Ops / sec ^ | | ------------------- | ---------------- | ------------- | ------------ | ----------- | | already in use | | | | | | suretype | 34 957 | 178 kB | ! a year ago | 45888428 | | yup | 6 010 821 | 256 kB | 4 months ago | 49089 | | ajv | 104 628 910 | 1.03 MB | 24 days ago | 36087310 | | zod | 10 113 395 | 667 kB | a month ago | 732610 | | also considered | | | | | | @sinclair/typebox | 30 775 766 | 1.37 MB | 6 days ago | 72960917 | | valibot | 205 846 | 1.41 MB | 3 days ago | 2400861 | | superstruct | 1 374 153 | 179 kB | 8 days ago | 213117 | | rejected | | | | | | joi | 9 993 000 | 531 kB | 9 days ago | - | | vest | 17 584 | 1.46 MB | 10 days ago | - | | typia | 41 940 | 3.84 MB | a day ago | 63246549 | | io-ts | 1 195 982 | 460 kB | 7 months ago | 2760554 | | arktype | 33 463 | 146 kB | 3 days ago | 69527419 |

^ runtime benchmark results for assertLoose from typescript-runtime-type-benchmarks (higher is better)

Reasoning

Starting with checklist - all libraries:

  • are non-trivial
  • have >10k monthly downloads on npm
  • are open source: all licensed under MIT (except joi: BSD-3-Clause)
  • were updated during the last 6 months (except suretype)

Already in use:

suretype

  • we already use it in multiple applications (CDK-based)
  • checklist no (last updated over a year ago, GitHub issues also seem kinda abandoned - though arguably it may be treated as stable library case)
  • ts-types yes
  • json-schema yes (conversion to JSON Schema through typeconv package)
  • form-validation no (no plans for react-hook-form resolver (https://dev.to/samchon/typescript-json-is-10-1000x-times-faster-than-zod-and-io-ts-8n6#comment-228pg))
  • inference-perf yes (no problem reports found)
  • no-framework yes
  • approachable yes
  • no-generation yes
  • runtime-perf (optional) yes
  • usable-in-lambda (optional) yes

Verdict: rejected due to poor support

zod

  • we already use it in supplier-manager
  • checklist yes
  • ts-types yes
  • json-schema no
  • form-validation yes (has react-hook-form resolvers)
  • inference-perf reportedly has severe types inference performance issues
    • Benchmark: "seems to have some structural issues with its types, ballooning in cost by 10+ times after just 2 .extend and 1 .omit call"
  • no-framework yes
  • approachable yes
  • no-generation yes
  • runtime-perf (optional) no (also no compilation mode)
  • usable-in-lambda (optional) yes

Verdict: rejected due to devtime and runtime performance

ajv

  • we already use it in hermione
  • checklist yes
  • ts-types yes (through json-schema-to-ts)
  • json-schema yes (directly uses JSON Schema)
  • form-validation yes (has react-hook-form resolvers)
  • inference-perf no (no problems with json-schema-to-ts IDE performance reported)
  • no-framework yes
  • approachable yes
  • no-generation yes (it uses startup compiler, which may take a while but then works really fast; and also supports buildtime compilation)
  • runtime-perf (optional) yes
  • usable-in-lambda (optional) yes (is big and requires companion package ajv-formats, but the JSON schema may be directly used by RequestValidator)

Verdict: contender, lost in product team voting

Also considered:

typebox

  • checklist yes
  • ts-types yes
  • json-schema yes (maps directly to JSON Schema)
  • form-validation yes (has react-hook-form resolvers)
  • inference-perf yes
  • no-framework yes
  • approachable yes
  • no-generation yes (compilation not required, but available)
  • runtime-perf (optional) yes, can work in compiled mode
  • usable-in-lambda (optional) yes (size kinda large, but potentially can generate JSON Schema for the RequestValidator during cdk deploy)

Verdict: contender, won in product team voting

valibot

  • checklist yes
  • ts-types yes
  • json-schema no (package valibot-json-schema exists, but: "Some of the features of Valibot can't be converted to JSON schema")
  • form-validation yes (has react-hook-form resolvers)
  • inference-perf no (no reports found)
  • no-framework yes
  • approachable yes
  • no-generation yes
  • runtime-perf (optional) no (no compilation mode)
  • usable-in-lambda (optional) yes (modular design to support tree-shaking)

Verdict: rejected due to no JSON Schema interoperability

superstruct

  • checklist yes
  • ts-types yes
  • json-schema no
  • form-validation yes (has react-hook-form resolvers)
  • inference-perf yes
  • no-framework yes
  • approachable yes
  • no-generation yes
  • runtime-perf (optional) no (no compilation mode)
  • usable-in-lambda (optional) yes

Verdict: rejected due to no JSON Schema interoperability

Outright rejected:

yup

  • ts-types no

joi

  • ts-types no

vest

  • ts-types no (manual typing)

typia

  • form-validation no (has no react-hook-form resolvers)
  • no-generation no (requires Ahead of Time compilation)

    When you write a TypeScript code calling typia.createIs() function and compile it through tsc command, typia will replace the typia.createIs() statement to optimal validation code in the compiled JavaScript file, for the IMember type.

io-ts

  • approachable no (relies on monads from fp-ts, may be a bit too weird?)

    The Either type returned by decode is defined in fp-ts, a library containing implementations of common algebraic types in TypeScript.

const decoded = User.decode(data); // Either<Errors, User>
if (isLeft(decoded)) {
  throw Error(
    `Could not validate data: ${PathReporter.report(decoded).join("\n")}`
  );
}

arktype

  • json-schema no (uses its own object schema format, not compatible with JSON Schema, not supported by IDEs)
type({
	keyword: "null",
	stringLiteral: "'TS'",
	numberLiteral: "5",
	bigintLiteral: "5n",
	union: "string|number",
	intersection: "boolean&true",
	array: "Date[]",
	grouping: "(0|1)[]",
	objectLiteral: {
		nested: "string",
		"optional?": "number"
	},
	tuple: ["number", "number"]
})

Consequences

How do we implement this change?

Whenever a team writes a new endpoint, configuration, or other code that benefits from validation, they use typebox. If the code requires frontend validation, the team will also use the opportunity to create an ADR about forms and form validation to decide on other libraries needed. Another ADR will be needed for the solution to generate client libraries (look below in "What is still unclear"). Also, if not time-prohibiting, every team will update code to the unified validation in the course of normal refactoring. We will probably not attempt to refactor the code purely to update the validation method.

Who will implement the change?

All the developers.

How do we teach this change?

The teaching branch was presented where typebox was shown replacing several typical usecases (endpoint, config, CDK restapi). Adrian will soon run a Learning Friday workshop to introduce typebox.

What could go wrong?

typebox may lack some functionality that we need which was not discovered during ADR investigation phase.

What do we do if something goes wrong?

We do the missing validation by hand - or switch to ajv which should be mostly compatible.

What is still unclear?

What about generating the client library from REST endpoints spec (like the OpenAPI schema -> TS SDK thing that we have in cd2)? RTK Query ADR says we use @rtk-query/codegen-openapi which requires OpenAPI spec. Specifically for AWS CDK's RestApi it's possible to use the OpenAPI spec as the routing/validation/handler config. You could even try to build it on the fly.

Related ADRs