Semantic Model¶
When AST nodes are created during parsing, they become the semantic model of your language. In Pegium, that semantic model is shaped directly by two things:
- the C++ AST node types you define
- the grammar assignments and actions that populate them
Unlike generator-centric workflows, Pegium does not infer a separate semantic type system from another DSL. The semantic model is already present in your C++ types.
This page explains how AST types, grammar assignments, references, and CST structure fit together.
AST fields shape the model¶
Consider this example:
struct Entity : pegium::AstNode {
string name;
optional<reference<Entity>> superType;
vector<pointer<Feature>> features;
};
This one type already tells the framework a lot:
nameis plain scalar semantic datasuperTypeis a link to another nodefeaturesare owned nested children
The parser then decides how those fields are populated:
Rule<ast::Entity> EntityRule{
"Entity",
"entity"_kw.i() + assign<&ast::Entity::name>(ID) +
option("extends"_kw.i() +
assign<&ast::Entity::superType>(QualifiedName)) +
"{"_kw + many(append<&ast::Entity::features>(FeatureRule)) +
"}"_kw};
So the semantic model in Pegium is not a post-processing artifact. It is the combined result of AST node definitions and grammar wiring.
Why this matters¶
The shape of the AST is not just an implementation detail. It directly affects:
- how references are represented and linked
- how validation reasons about the model
- how formatter and editor features find the right source regions
- how stable your language-specific code remains as the grammar evolves
References are part of the model¶
References are not just strings stored in fields. They carry:
- the text written by the user
- the owning container, feature name, target type, and reference cardinality
- the source CST node, when available
- the resolved target, once linking has happened
That is why the same model can support linking, completion, rename, and hover without inventing separate data structures for each feature.
Runtime type information¶
Runtime AST reflection also treats pegium::AstNode as the implicit root type
of every registered AST class. In practice, AstReflection::isSubtype(...) and
AstReflection::getAllSubTypes(...) stay consistent with
AstReflection::isInstance(..., typeid(pegium::AstNode)).
AstReflection::getAllTypes() and AstReflection::getAllSubTypes(...) expose
stable std::unordered_set views over the bootstrapped registry state, so
their iteration order is intentionally unspecified.
Pegium bootstraps that reflection once during single-threaded language
registration by walking the parser entry rule. Because this bootstrap probes
the participating AST types directly, AST-producing parser aliases such as
Rule<T> and Infix<T, ...> require DefaultConstructibleAstNode<T>.
That constraint only applies to AST types that are produced directly by parser rules. Abstract or non-default-constructible AST supertypes can still be used as reference targets or as containment slot supertypes in assignments; the bootstrap only needs their runtime type information for subtype filtering.
Pegium therefore treats the parsed AST as a mutable technical model of the source program. The recommended pattern is:
- keep parser-managed AST nodes simple and default-constructible
- express semantic constraints with validation, linking, or later transforms
- build a stricter domain model after parsing if your application needs one
Why the CST still matters¶
The AST is the semantic model, but Pegium keeps the CST alongside it because many editor-facing features still need source structure:
- formatting
- comment rewriting
- keyword lookup
- precise property ranges
- cursor-sensitive features
So the real working model of a Pegium language is AST plus CST plus references.
Stable modeling advice¶
For mature languages, it is worth treating the AST as a deliberate API rather than just whatever happened to fall out of the first grammar draft.
Good signs of a stable model:
- containment is explicit
- references are modeled as references, not as raw strings
- optionality reflects real semantics
- later services can reason about the tree without special-case hacks