AST and CST¶
Pegium keeps both the abstract syntax tree and the concrete syntax tree available.
AST¶
Define AST nodes by deriving from pegium::AstNode:
struct Entity : pegium::AstNode {
string name;
optional<reference<Entity>> superType;
vector<pointer<Feature>> features;
};
The AST should model language semantics:
- names and scalar values
- containment between nodes
- cross-references
- optional and repeated properties
AST construction contract¶
Pegium parses directly into your C++ AST types, but parser-managed AST nodes are created as empty mutable shells and then filled through grammar assignments.
In practice, that means:
- concrete AST node types produced by
Rule<T>,create<T>(),nest<T, ...>(), orInfix<T, ...>must be default-constructible - constructor-enforced semantic invariants are not the intended modeling style for parser-managed AST nodes
- if your application needs a stricter domain model, build it after parsing as a separate layer
This is a current limitation of the generic runtime design. It keeps the grammar API lightweight and avoids requiring explicit hierarchy metadata for every AST type.
AST field types¶
Pegium supports a small set of field shapes that cover the common language modeling needs.
Scalar values¶
Use regular value fields for textual or numeric data:
struct NumberLiteral : pegium::AstNode {
double value = 0.0;
};
struct Feature : pegium::AstNode {
bool many = false;
string name;
};
Typical scalar field types:
stringbool- integer aliases inherited from
AstNodesuch asint32_t,uint64_t, and so on - floating-point values such as
double - enums
- custom value types, as long as your terminal or data-type rule can construct them
variant<T...>when a property is intentionally one-of-several value shapes
Use scalar fields for values that are owned directly by the node and do not represent containment or cross-document linking.
Example:
struct Example : pegium::AstNode {
variant<bool, string> value;
};
Optional scalar values¶
Use optional<T> when the property may be absent:
struct Test : pegium::AstNode {
optional<string> testFile;
};
This is useful for optional names, strings, numbers, enums, and similar value properties.
Single contained child¶
Use pointer<T> for a single contained AST child:
struct Evaluation : pegium::AstNode {
pointer<Expression> expression;
};
pointer<T> is an alias for std::unique_ptr<T>. This is the standard way to
model containment in Pegium.
Typical use cases:
- one expression inside another node
- one optional-like child that is either present or absent
- one owned nested declaration
Repeated contained children¶
Use vector<pointer<T>> for a list of contained AST children:
struct Entity : pegium::AstNode {
vector<pointer<Feature>> features;
};
This is the standard shape for repeated containment.
Single reference¶
Use reference<T> for a link to another AST node:
struct FunctionCall : pegium::AstNode {
reference<AbstractDefinition> func;
};
A reference<T> stores reference text and resolves later through the linker. It
is not containment.
Typical use cases:
- super types
- called functions
- referenced states, commands, or environments
Optional reference¶
Use optional<reference<T>> when a reference may be absent:
struct Entity : pegium::AstNode {
optional<reference<Entity>> superType;
};
Repeated references¶
Use vector<reference<T>> when the source syntax contains several independent
references:
struct Requirement : pegium::AstNode {
vector<reference<Environment>> environments;
};
This is useful when the text contains several names that should each resolve to one target.
Multi-reference¶
Pegium also provides multi_reference<T> for one reference slot that may
resolve to several targets.
struct Example : pegium::AstNode {
multi_reference<MyNode> targets;
};
There is also optional<multi_reference<T>> when that slot itself is optional.
Use multi_reference<T> only when one reference occurrence in the source is
meant to resolve to multiple targets. If the source contains several explicit
names, vector<reference<T>> is usually the better shape.
Repeated scalar values¶
Use vector<T> for repeated scalar data:
struct Example : pegium::AstNode {
vector<string> tags;
};
This is for repeated values, not repeated contained nodes. For repeated child
nodes, keep using vector<pointer<T>>.
AST aliases inherited from AstNode¶
When you derive from pegium::AstNode, you can use these aliases directly in
the struct body:
stringint8_t,int16_t,int32_t,int64_tuint8_t,uint16_t,uint32_t,uint64_toptional<T>variant<T...>vector<T>pointer<T>reference<T>multi_reference<T>
That is why the examples can write string, pointer<Expression>, or
vector<reference<Environment>> without qualifying them.
Recommended AST style¶
Prefer these shapes:
- scalar values for plain data
pointer<T>for one contained childvector<pointer<T>>for repeated contained childrenreference<T>for one linked targetvector<reference<T>>for repeated explicit linksoptional<T>only when absence is semantically meaningful
This matches the shipped examples and keeps the AST easy to traverse and link.
CST¶
pegium::CstNodeViewrepresents a stable view into the parsed concrete tree- offsets, ranges, children, siblings, hidden nodes, and recovered nodes are available through the CST API
pegium::CstUtilscontains lookup helpers for properties, keywords, interior nodes, and node-at-offset operations
Why both matter¶
- AST drives semantics, validation, scoping, and most editor features
- CST is required for precise formatting, offset-based lookup, and source-level operations that must preserve text layout
Recommended pattern¶
Model semantics in the AST first. Drop to CST only when the feature is inherently source-aware, such as formatting, comment handling, or cursor-position logic.