Go XML Improvements
Having used the encoding/xml package quite a bit now, there are a number of improvements I'd like to see.
- Speed. Unmarshaling can achieve ~10-60 MB/s throughput on a 2018 Desktop CPU. By comparison, high performance C++ xml libraries such as pugixml can achieve ~1-2 GB/s parsing throughput.
- An easier way to map struct fields back to input bytes. When using xml as a file format, a common use case is unmarshaling followed by validation. When validation errors occur, it's useful to output where in the file they occur (line and column). This can sort of be achieved for elements but not attributes using the InputOffset method. It's not easy.
- Treat unknown elements/attributes as errors. The json package has DisallowUnknownFields which does this but it only returns the first unknown. For file format validation, the user might want all unknown elements/attributes.
- Treat duplicate elements as errors. When there is a single receiver (not a slice) and duplicate elements in the input, the final value will overwrite all previous values. There are cases where we want to catch this and treat it as an error without having to use slice receivers everywhere.
- Remove the nested element xml tag. This one I don't feel as strongly about but it can produce inconsistent behavior between marshaling and unmarshaling (example). The inconsistency is clearly documented.
I think most of these changes belong outside of the standard library. The standard library is striking a difficult balance of supporting many different use cases while maintaining a simple API, safety, adhering to the xml spec, and mapping xml to go. rsc's BUG comment notes the challenge.
Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values. See package json for a textual representation more suitable to data structures.
Finally, one of the neat tricks you can use for unmarshaling arbitrary xml trees is a recursive definition.
type Element struct { XMLName xml.Name CharData string `xml:",chardata"` Attrs []xml.Attr `xml:",any,attr"` Elems []Element `xml:",any"` }