--- title: "Know your nodes" format: html vignette: > %\VignetteIndexEntry{Know your nodes} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>") library(parsermd) ``` # Introduction The parsermd package parses R Markdown and Quarto documents into an Abstract Syntax Tree (AST) representation. This vignette introduces the different types of AST nodes and their properties, helping you understand how parsermd represents document structure. ## AST Container - `rmd_ast` The `rmd_ast` object serves as the container for all parsed document nodes. It holds a linear sequence of nodes representing different document elements, where each node type corresponds to a specific R Markdown or Quarto construct (headings, code chunks, text, etc.). **Important**: The AST represents documents as a linear sequence of nodes, not a nested tree structure. This means that structural elements like fenced divs are represented as separate opening and closing nodes in the sequence, rather than as nodes with children. The default print method for `rmd_ast`'s (`flat = FALSE`) presents an implicit tree structure based on heading levels. This provides a hierarchical view that reflects the document's logical organization, where content is grouped under headings based on their level. **Properties:** - `nodes`: A list containing all the parsed nodes in document order **Example:** Raw text that would be parsed: ````markdown --- title: "Example Document" --- # Introduction This is some text. ```{{r}} x <- 1:5 mean(x) ``` ```` This would create an `rmd_ast` object containing: 1. `rmd_yaml` node with the title 2. `rmd_heading` node with "Introduction" 3. `rmd_markdown` node with "This is some text." 4. `rmd_chunk` node with the R code Programmatic creation: ```{r} ast = rmd_ast(list( rmd_yaml(list(title = "Example Document")), rmd_heading(name = "Introduction", level = 1L), rmd_markdown(lines = "This is some text."), rmd_chunk( engine = "r", code = c("x <- 1:5", "mean(x)") ) )) ``` :::::: {.columns} ::: {.column width="50%"} **Hierarchical view (`flat = FALSE`):** ```{r} print(ast, flat = FALSE) ``` ::: ::: {.column width="50%"} **Linear view (`flat = TRUE`):** ```{r} print(ast, flat = TRUE) ``` ::: :::::: --- # S7 Class System parsermd uses the S7 object system for all AST node types. S7 provides a modern, robust class system with: - **Type safety**: Properties are validated when objects are created or modified - **Performance**: Efficient method dispatch and memory usage - **Consistency**: Uniform interface across all node types **Key S7 Features in parsermd:** - All node types inherit from the base `rmd_node` class - Properties are accessed using `@` syntax (e.g., `node@content`) - Validation ensures data integrity (proper types, lengths, etc.) - Method dispatch works seamlessly with generic functions **Property Access:** ```{r} # Create a heading node heading = rmd_heading(name = "Section Title", level = 2L) # Access properties with @ heading@name heading@level ``` --- # Core Node Types ## Document Structure Nodes ### YAML Header - `rmd_yaml` The `rmd_yaml` node represents YAML front matter at the beginning of documents. **Properties:** - `yaml`: List containing the parsed YAML content **Example:** Raw text that would be parsed: ```yaml --- title: "My Document" author: "John Doe" date: "2023-01-01" --- ``` Programmatic creation: ```{r} yaml_node = rmd_yaml(list( title = "My Document", author = "John Doe", date = "2023-01-01" )) yaml_node ``` --- ### Markdown Headings - `rmd_heading` The `rmd_heading` node represents section headings in markdown. **Properties:** - `name`: Character string containing the heading text - `level`: Integer from 1-6 indicating the heading level (# = 1, ## = 2, etc.) **Example:** Raw text that would be parsed: ```markdown # Introduction ``` Programmatic creation: ```{r} heading_node = rmd_heading( name = "Introduction", level = 1L ) heading_node ``` --- ### Markdown Text - `rmd_markdown` The `rmd_markdown` node represents plain markdown text content. **Properties:** - `lines`: Character vector containing the markdown text lines **Example:** Raw text that would be parsed: ```markdown This is a paragraph. With multiple lines. ``` Programmatic creation: ```{r} markdown_node = rmd_markdown( lines = c("This is a paragraph.", "With multiple lines.") ) markdown_node ``` --- ## Code and Execution Nodes ### Executable Code Chunks - `rmd_chunk` The `rmd_chunk` node represents executable code chunks with options and metadata. **Properties:** - `engine`: The code engine (default: "r") - `label`: Optional chunk name/label - `options`: List of chunk options containing both traditional and YAML options - `code`: Character vector containing the code lines - `indent`: Indentation string - `n_ticks`: Number of backticks used (default: 3) **Chunk Option Formats:** Chunks support two option formats that can be used independently or together: 1. **Traditional format**: Options specified in the chunk header after the engine and label ```{{r chunk-label, eval=TRUE, echo=FALSE}} 2. **YAML format**: Options specified as YAML comments within the chunk ```{{r chunk-label}} #| eval: true #| echo: false ``` **Option Conflict Resolution:** When the same option is specified in both formats, YAML options take precedence over traditional options. A warning is emitted when conflicts occur: ```{{r eval=TRUE}} #| eval: false ``` In this case, `eval: false` (YAML) wins over `eval=TRUE` (traditional), and the parser emits: "YAML options override traditional options for: eval" **Type Handling:** - **Traditional options**: Always stored as strings (e.g., `"TRUE"`, `"5"`) - **YAML options**: Preserve proper R types (e.g., `TRUE`, `5L`, `3.14`) **Examples:** **Traditional format chunk:** ````markdown ```{{r example, eval=TRUE, echo=FALSE}} x <- 1:10 mean(x) ``` ```` **YAML format chunk:** ````markdown ```{{r example}} #| eval: true #| echo: false x <- 1:10 mean(x) ``` ```` **Mixed format chunk (with conflict):** ````markdown ```{{r example, eval=TRUE}} #| eval: false #| message: false x <- 1:10 mean(x) ``` ```` In this case, `eval: false` (YAML) overrides `eval=TRUE` (traditional). **Programmatic creation:** ```{r} # Traditional-style options chunk_node_traditional = rmd_chunk( engine = "r", label = "example", options = list(eval = "TRUE", echo = "FALSE"), code = c("x <- 1:10", "mean(x)") ) # YAML-style options with proper types chunk_node_yaml = rmd_chunk( engine = "r", label = "example", options = list(eval = TRUE, echo = FALSE), code = c("x <- 1:10", "mean(x)") ) chunk_node_yaml ``` --- ### Raw Output Chunks - `rmd_raw_chunk` The `rmd_raw_chunk` node represents raw output chunks for specific formats. **Properties:** - `format`: The output format (e.g., "html", "latex") - `code`: Character vector containing the raw content - `indent`: Indentation string - `n_ticks`: Number of backticks used **Example:** Raw text that would be parsed: ````markdown ```{=html}
Custom HTML content
Custom HTML content
", "