4. Shape the AST and CST¶
After defining the grammar, you already have a parser. But a language project needs more than “accepted text”. It needs a semantic model that later services can rely on.
In Pegium, this step is not a separate code generation phase. AST and CST shape are defined directly by the parser and the C++ node types you choose.
The semantic model¶
The AST is the semantic tree of your language. A typical AST node looks like this:
struct Entity : pegium::AstNode {
string name;
optional<reference<Entity>> superType;
vector<pointer<Feature>> features;
};
This one node already captures three different semantics:
nameis plain scalar datasuperTypeis a reference that will resolve laterfeaturesare contained child nodes owned by the entity
How the parser builds the model¶
The grammar determines how those fields are populated. For example:
Rule<ast::Entity> EntityRule{
"Entity",
"entity"_kw.i() + assign<&ast::Entity::name>(ID) +
option("extends"_kw.i() +
assign<&ast::Entity::superType>(QualifiedName)) +
"{"_kw + many(append<&ast::Entity::features>(FeatureRule)) +
"}"_kw};
This is where parsing and model construction meet:
assignwrites one value into a fieldappendadds repeated children or valuescreate,action, andnesthelp shape more advanced trees
Why the CST matters too¶
Pegium keeps the CST alongside the AST. This is important because not every feature is purely semantic.
The CST is later used for:
- formatting
- comment handling
- precise keyword and property lookup
- cursor-position-sensitive editor features
So even when the AST is your main semantic model, the CST remains a first-class part of the language infrastructure.
Recommended modeling style¶
When shaping the AST, keep it close to the semantics of the language:
- use scalar fields for plain values
- use
pointer<T>for owned children - use
vector<pointer<T>>for repeated owned children - use
reference<T>for links to other nodes - use
optional<T>only when absence is meaningful
This tends to make validation, linking, and formatting easier later.
What to expect at the end of this step¶
At the end of this step, your parser should produce the node structure that the rest of the framework will work with, and that structure should already feel like the language model you want to reason about.