| .. | ||
| .gitignore | ||
| .golangci.yml | ||
| builder.go | ||
| dns.go | ||
| errors.go | ||
| LICENSE.md | ||
| post_go20.go | ||
| pre_go20.go | ||
| README.md | ||
| uri.go | ||
uri
Package uri is meant to be an RFC 3986 compliant URI builder, parser and validator for golang.
It supports strict RFC validation for URI and URI relative references.
This allows for stricter conformance than the net/url package in the Go standard libary,
which provides a workable but loose implementation of the RFC for URLs.
What's new?
v1.1.0
Build
- requires go1.19
Features
- Typed errors: parsing and validation now returns errors of type
uri.Error, with a more accurate pinpointing of the error provided by the value. Errors support the go1.20 addition to standard errors withJoin()andCause(). For go1.19, backward compatibility is ensured (errors.Join() is emulated). - DNS schemes can be overridden at the package level
Performances
- Significantly improved parsing speed by dropping usage of regular expressions and reducing allocations (~ x20 faster).
Fixes
- stricter compliance regarding paths beginning with a double '/'
- stricter compliance regarding the length of DNS names and their segments
- stricter compliance regarding IPv6 addresses with an empty zone
- stricter compliance regarding IPv6 vs IPv4 litterals
- an empty IPv6 litteral
[]is invalid
Known open issues
- IRI validation lacks strictness
- IPv6 validation relies on the standard library and lacks strictness
Other
Major refactoring to enhance code readability, esp. for testing code.
- Refactored validations
- Refactored test suite
- Added support for fuzzing, dependabots & codeQL scans
Usage
Parsing
u, err := Parse("https://example.com:8080/path")
if err != nil {
fmt.Printf("Invalid URI")
} else {
fmt.Printf("%s", u.Scheme())
}
// Output: https
u, err := ParseReference("//example.com/path")
if err != nil {
fmt.Printf("Invalid URI reference")
} else {
fmt.Printf("%s", u.Authority().Path())
}
// Output: /path
Validating
isValid := IsURI("urn://example.com?query=x#fragment/path") // true
isValid= IsURI("//example.com?query=x#fragment/path") // false
isValid= IsURIReference("//example.com?query=x#fragment/path") // true
Caveats
- Registered name vs DNS name: RFC3986 defines a super-permissive "registered name" for hosts, for URIs
not specifically related to an Internet name. Our validation performs a stricter host validation according
to DNS rules whenever the scheme is a well-known IANA-registered scheme
(the function
UsesDNSHostValidation(string) boolis customizable).
Examples:
ftp://host,http://hostdefault to validating a proper DNS hostname.
-
IPv6 validation relies on IP parsing from the standard library. It is not super strict regarding the full-fledged IPv6 specification.
-
URI vs URL: every URL should be a URI, but the converse does not always hold. This module intends to perform stricter validation than the pragmatic standard library
net/url, which currently remains about 30% faster. -
URI vs IRI: at this moment, this module checks for URI, while supporting unicode letters as
ALPHAtokens. This is not strictly compliant with the IRI specification (see known issues).
Building
The exposed type URI can be transformed into a fluent Builder to set the parts of an URI.
aURI, _ := Parse("mailto://user@domain.com")
newURI := auri.Builder().SetUserInfo(test.name).SetHost("newdomain.com").SetScheme("http").SetPort("443")
Canonicalization
Not supported for now (contemplated as a topic for V2).
For URL normalization, see PuerkitoBio/purell.
Reference specifications
The librarian's corner (still WIP).
| Title | Reference | Notes |
|---|---|---|
| Uniform Resource Identifier (URI) | RFC3986 | Deviations (1) |
| Uniform Resource Locator (URL) | RFC1738 | |
| Relative URL | RFC1808 | |
| Internationalized Resource Identifier (IRI) | RFC3987 | (1) |
| IPv6 addressing scheme reference and erratum | (2) | |
| Representing IPv6 Zone Identifiers | RFC6874 | |
| https://tools.ietf.org/html/rfc6874 | ||
| https://www.rfc-editor.org/rfc/rfc3513 |
(1) Deviations from the RFC:
- Tokens: ALPHAs are tolerated to be Unicode Letter codepoints, DIGITs are tolerated to be Unicode Digit codepoints. Some improvements are needed to abide more strictly to IRIi's provisions for internationalization.
(2) IP addresses:
- Now validation is stricter regarding
[...]litterals (which must be IPv6) and ddd.ddd.ddd.ddd litterals (which must be IPv4). - RFC3886 requires the 6 parts of the IPv6 to be present. This module tolerates common syntax, such as
[::]. Notice that[]is illegal, although the golang IP parser equates this to[::](zero value IP). - IPv6 zones are supported, with the '%' escaped as '%25'
FAQ
Benchmarks
Credits
-
Tests have been aggregated from the test suites of URI validators from other languages: Perl, Python, Scala, .Net. and the Go url standard library.
-
This package was initially based on the work from ttacon/uri (credits: Trey Tacon).
Extra features like MySQL URIs present in the original repo have been removed.
- A lot of improvements and suggestions have been brought by the incredible guys at
fyne-io. Thanks all.
TODOs
- [] Support IRI
ucscharas unreserved characters - [] Support IRI iprivate in query
- [] Prepare v2. See the proposal
- [] Revisit URI vs IRI support & strictness, possibly with options (V2?)
- [] Other investigations
Notes
ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD
iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD
// TODO: RFC6874
// A <zone_id> SHOULD contain only ASCII characters classified as
// "unreserved" for use in URIs [RFC3986]. This excludes characters
// such as "]" or even "%" that would complicate parsing. However, the
// syntax described below does allow such characters to be percent-
// encoded, for compatibility with existing devices that use them.