Parsing upon deserialising

This commit is contained in:
Simon Martens
2025-11-14 15:29:51 +01:00
parent a46c171de7
commit 2e251f446f
9 changed files with 633 additions and 331 deletions

112
Agents.md Normal file
View File

@@ -0,0 +1,112 @@
# Agents
This codebase powers the born-digital edition of Johann Jacob Lenzs correspondence. Every package acts like an agent in the editorial pipeline: ingesting the XML corpus, enriching it with scholarly context, and presenting the material on the public website. This document inventories those agents, how they cooperate, and where to extend them.
## System lineage
- **Entry point (`lenz.go`)** boots with `config`, clones or reuses the corpus Git repo via `git/`, parses TEI-like XML into memory (`xmlmodels`), wires the templating engine and cache, then starts the Fiber server.
- **Data origin** XML, references, and tradition files live inside the checked-out repository (`config.BaseDIR` + `config.GITPath`). Changes arrive either through CLI restarts, filesystem watchers (dev), or GitHub webhooks (`controllers/webhook.go`).
- **Request flow** Fiber receives HTTP traffic (`server/`), routes it through the controllers (`controllers/*.go`), pulls structured data from `xmlmodels`, renders Go templates (`templating/`, `views/`), and serves static assets through the custom filesystem middleware.
## Agent catalog
### 1. Configuration & boot agent
- **Scope** `config/`, `lenz.go`.
- **Responsibilities**
- Merge `config.dev.json`, `config.json`, and `KGPZ_*` env vars, then enforce defaults for cache directories, bind address, webhook endpoint, etc.
- Shape the runtime: debug mode flips Fiber logging on, turns on hot reloaders, and exposes the WebSocket live-reload port.
- Stage the working directories (`_cache/git`, `_cache/gnd`, `_cache/geo`, `_cache/search`, `data_bilder`).
- **Extension hooks**
- Add new configuration knobs to `config.Config`, plumb them through `ConfigProvider`, then surface them inside `controllers` via `ctx.Locals("cfg", cfg)`.
- Update `Dockerfile` or `docker-compose.*` when the runtime contract changes (ports, binaries, base path).
### 2. Corpus sync agent
- **Scope** `git/git.go`, `_cache/git`, `controllers/webhook.go`.
- **Responsibilities**
- Clone, validate, and update the corpus repository (default `https://github.com/Theodor-Springmann-Stiftung/lenz-briefe`) while keeping branch and hash bookkeeping (`gitprovider.Commit`).
- Expose two lifecycle calls: `OpenOrClone` (during boot) and `Pull` (during webhook), each guarded by a global mutex to avoid concurrent git state.
- On webhook delivery, verify `X-Hub-Signature-256`, pull the branch, trigger `xmlmodels.Parse`, and reset the cache store so visitors immediately see the update.
- **Extension hooks**
- Add new repo validation rules by extending `ValidateBranch` or the webhook handler (e.g., topic filtering).
- Emit observability metrics or notifications after `Pull` and parse success.
### 3. XML modeling agent
- **Scope** `xmlmodels/`, `xmlparsing/`.
- **Responsibilities**
- Parse `data/xml/meta.xml`, `briefe.xml`, `references.xml`, and `traditions.xml` concurrently into strongly typed structs (`Meta`, `Letter`, `Tradition`, `PersonDef`, `LocationDef`, etc.).
- Provide a singleton `Library` (`xmlmodels.Set/Get`) that caches derived artefacts (`sync.Map` for year groupings, person/place lookups) and exposes helpers like `NextPrev`, `Years`, and `Tradition`.
- Offload streaming tokenization, entity resolution, optional booleans, and TEI-ish constructs to `xmlparsing` (see `parser.go`, `resolver.go`, `optionalbool.go`, `xmlsort.go`).
- **Extension hooks**
- Add new TEI registers by amending `xmlmodels/library.go` and `xmlmodels/roots.go`, then plugging in an `xmlparsing.XMLParser` for the new structure.
- When introducing derived caches, use the built-in `sync.Map` to avoid reprocesing across requests.
### 4. Text rendering agent
- **Scope** `helpers/functions/textparse.go`, `helpers/functions/*.go`.
- **Responsibilities**
- Convert `Letter.Content` (inner XML) into semantic HTML fragments consumable by the templates: inline sidenotes, block notes, insertion markers, manuscript hands, printable line/pagination counters, etc.
- Offer template-safe utilities (date math, slicing, string helpers, HTML escaping, embedder functions) injected via `templating.Engine.AddFunc` and `Library.FuncMap()`.
- **Extension hooks**
- Support new TEI tags or editorial conventions by updating the `switch` in `Parse` and by introducing helper `Tokens` operations when needed.
- Register the helper with templates through `templating.Engine.AddFunc("NewFunc", fn)` to keep the template surface area explicit.
### 5. Presentation & asset agent
- **Scope** `templating/`, `views/`, `helpers/middleware`.
- **Responsibilities**
- Maintain the custom Go template engine (`templating.Engine`): layout registry, route registry, live-reload WebSocket (port 9000), and debug refresh/reset watchers (see `watcher.go`).
- Deliver static artefacts (`views/assets`, fonts, paged.js bundle) through the fiber-aware filesystem middleware with cache-control + optional weak ETags.
- Organize the view layer: `views/layouts/default/root.gohtml` for the shell, `views/routes/**/head.gohtml` & `body.gohtml` pairs for routes, `views/routes/components` for partials, and `views/public` as embeddable snippets/XSLT.
- Ship the CSS/JS toolchain (Tailwind 4, DaisyUI, Vite) via `views/package.json`, `vite.config.js`, and `transform/site.css`, producing the compiled `assets/` bundle that gets embedded at build time (`views/embed*.go`).
- **Extension hooks**
- When adding routes, create a new directory under `views/routes/<slug>/` with `head` and `body` templates and call `c.Render("/<slug>/", data)` from the matching controller.
- For new global data, use `Engine.Globals` so layouts gain the values without per-controller plumbing.
- In dev, use `cfg.Debug=true` so `SetupRefreshWatcher` reloads templates when files under `views/routes` or `views/layouts` change.
### 6. Delivery agent (HTTP server & controllers)
- **Scope** `server/`, `controllers/`.
- **Responsibilities**
- Configure Fiber with shared middlewares (compression, cache, logging/recover, custom cache key generator) and expose the `Server` struct that owns the `templating.Engine` and memory-backed cache store (`github.com/gofiber/storage/memory`).
- Register web routes:
- `/` redirects to `/briefe`.
- `/briefe` lists letters grouped into historic ranges (see `controllers/uebersicht.go`).
- `/brief/:letter` renders an individual letter with prev/next navigation, parsed body, and tradition apparatus (`controllers/brief.go`).
- `/datenschutz`, `/ausgabe/zitation`, `/ausgabe/edition`, `/kontakt` render mostly static material via the template partials.
- `/assets/**` serves CSS/JS/images with compression + ETag (middleware).
- Expose Go structs (e.g., `DateRange`) to the templates so grouping logic stays in Go rather than HTML.
- **Extension hooks**
- Add future APIs (e.g., JSON endpoints) inside `controllers` while reusing `xmlmodels` for data access.
- Extend caching strategy by tweaking `server/cache.go` or `CacheFunc` (e.g., to drop caching for preview routes).
### 7. Update, cache & observability agent
- **Scope** `watcher.go`, `server/cache.go`, `controllers/webhook.go`.
- **Responsibilities**
- Implement hot-reload in debug builds: watchers monitor `./views/assets` for refreshes and `./views/layouts`, `./views/routes` for full reloads, then call `Engine.Refresh()` or `Engine.Reload()` and reset the in-memory cache.
- Provide manual cache reset helpers (`ResetFunction`, `RefreshFunction`) that controllers or tooling can reuse.
- Support GitHub webhook-triggered cache invalidation (see agent 2).
- **Extension hooks**
- Watch additional directories (e.g., content markdown) by appending to `REFRESH_CHANGES` / `RESET_CHANGES`.
- Integrate metrics by instrumenting `server.Server.Start/Stop` and the webhook handler.
## Data contracts
| Type | File | Purpose |
| --- | --- | --- |
| `xmlmodels.Meta` | `data/xml/meta.xml` | Chronological metadata per letter (sent/received actions, person/place refs, proofing flags). Supplies the timeline, filters, and prev/next logic. |
| `xmlmodels.Letter` | `data/xml/briefe.xml` | Carries the full textual content, pagination hints, and handwriting markers. Parsed into HTML via the text rendering agent. |
| `xmlmodels.Tradition` | `data/xml/traditions.xml` | Encodes the apparatus (`Apps`) for each letter so the UI can show textual variants and transmission notes. |
| `xmlmodels.PersonDef`, `LocationDef`, `AppDef` | `data/xml/references.xml` | Lookup tables for persons, places, and apparatus definitions, surfaced through the template func map (`Person`, `Place`, `App`). |
All XML parsing goes through `xmlparsing.XMLParser[T]`, so new record types only require defining a struct with `xml` tags plus a corresponding root type in `xmlmodels/roots.go`.
## Request lifecycle (TL;DR)
1. **Startup** `config.Get()``gitprovider.OpenOrClone()``xmlmodels.Parse()` populates the singleton library and caches.
2. **Serve** `server.New()` builds Fiber + templating + cache; `controllers.Register()` binds routes.
3. **Handle request** Controller retrieves data from `xmlmodels.Get()`, optionally transforms it (sorting, grouping), and renders a template. The templating engine merges controller data with globals and layout fragments, optionally injecting the live-reload script in debug mode.
4. **Deliver** Responses are cached in memory (unless `CacheFunc` opts out) and static assets flow through the filesystem middleware with compression + weak validators.
5. **Update corpus** GitHub webhook or manual restart pulls new XML content, re-runs `xmlmodels.Parse`, clears caches, and future requests see the updated edition.
## Development workflow
- **Go server** `go run .` (or `air`, etc.). Ensure `_cache/git` is writable; config defaults to `_cache` relative to repo root.
- **Frontend assets** `cd views && npm install && npm run dev` during template work, or `npm run build` to regenerate `views/assets`. The Go build embeds the compiled assets unless built with `-tags dev`.
- **Debugging** Set `debug` to `true` in `config.dev.json`. That turns on verbose logs, template reloaders, the WebSocket refresher, and disables static caching so CSS/JS changes land instantly.
- **Deployment** `Dockerfile` builds a static binary (`go build`) and exposes port 8085. Mount `_cache` as a volume (or bake in) so git clones persist across container restarts.
Together these agents ensure that the scholarly corpus, textual enrichment logic, and presentation layer remain decoupled yet synchronized, making it straightforward to add new editorial features or publish new findings about Lenzs letters.

View File

@@ -3,7 +3,6 @@ package controllers
import (
"strconv"
"github.com/Theodor-Springmann-Stiftung/lenz-web/helpers/functions"
"github.com/Theodor-Springmann-Stiftung/lenz-web/xmlmodels"
"github.com/gofiber/fiber/v2"
)
@@ -22,8 +21,16 @@ func GetLetter(c *fiber.Ctx) error {
}
np := lib.NextPrev(meta)
parsed := functions.ParseText(lib, meta)
letterData := lib.Letters.Item(letter)
if letterData == nil {
return c.SendStatus(fiber.StatusNotFound)
}
html := ""
if state := letterData.HTML.Data(); state != nil {
html = state.String()
}
tradition := lib.Traditions.Item(letter)
return c.Render("/brief/", fiber.Map{"meta": meta, "text": parsed, "tradition": tradition, "next": np.Next, "prev": np.Prev})
return c.Render("/brief/", fiber.Map{"meta": meta, "text": html, "tradition": tradition, "next": np.Next, "prev": np.Prev})
}

View File

@@ -1,246 +0,0 @@
package functions
import (
"math/rand"
"strconv"
"strings"
"github.com/Theodor-Springmann-Stiftung/lenz-web/xmlmodels"
"github.com/Theodor-Springmann-Stiftung/lenz-web/xmlparsing"
)
const charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
func RandString(length int) string {
b := make([]byte, length)
for i := range b {
b[i] = charset[rand.Intn(len(charset))]
}
return string(b)
}
type Note struct {
Id string
Tokens Tokens
}
type LenzParseState struct {
Tokens Tokens
Notes []Note
Count []Note
LC int
PC string
CloseElement bool
Break bool
PageBreak bool
LineBreak bool
}
func (s *LenzParseState) String() string {
builder := strings.Builder{}
builder.WriteString(outToken{Name: "div", Classes: []string{"count"}, Type: Element}.String())
for _, c := range s.Count {
builder.WriteString(c.Tokens.String())
}
builder.WriteString(outToken{Name: "div", Classes: []string{"count"}, Type: EndElement}.String())
s.Tokens.Prepend(outToken{Name: "div", Classes: []string{"fulltext"}, Type: Element})
s.Tokens.AppendEndElement()
builder.WriteString(s.Tokens.String())
builder.WriteString(outToken{Name: "div", Classes: []string{"notes"}, Type: Element}.String())
for _, note := range s.Notes {
builder.WriteString(note.Tokens.String())
}
builder.WriteString(outToken{Name: "div", Classes: []string{"notes"}, Type: EndElement}.String())
return builder.String()
}
func (s *LenzParseState) AppendNote(note Note) {
s.Notes = append(s.Notes, note)
}
func ParseText(lib *xmlmodels.Library, meta *xmlmodels.Meta) string {
if lib == nil {
return ""
}
text := lib.Letters.Item(meta.Letter)
if text == nil {
return ""
}
return Parse(lib, meta, text.Content)
}
func TemplateParse(lib *xmlmodels.Library) func(letter *xmlmodels.Meta, s string) string {
return func(letter *xmlmodels.Meta, s string) string {
return Parse(lib, letter, s)
}
}
func Parse(lib *xmlmodels.Library, letter *xmlmodels.Meta, s string) string {
if len(s) == 0 {
return ""
}
ps := LenzParseState{CloseElement: true, PC: "1"}
parser := xmlparsing.NewParser(s)
for elem, err := range parser.Iterate() {
if err != nil {
return err.Error()
}
if elem.Type < 3 {
if elem.Type == xmlparsing.EndElement {
if elem.Name == "sidenote" {
ps.LineBreak = true
}
if ps.CloseElement {
ps.Tokens.AppendEndElement()
} else {
ps.CloseElement = true
}
continue
}
switch elem.Name {
case "insertion":
ps.Tokens.AppendDefaultElement(elem)
ps.Tokens.AppendDivElement("", "insertion-marker")
ps.Tokens.AppendEndElement()
case "sidenote":
id := RandString(8)
ps.Tokens.AppendDefaultElement(elem)
ps.Break = false
ps.Tokens.AppendCustomAttribute("aria-describedby", id)
if elem.Attributes["annotation"] != "" ||
elem.Attributes["page"] != "" ||
elem.Attributes["pos"] != "" {
note := Note{Id: id}
note.Tokens.AppendDivElement(id, "note-sidenote-meta")
ps.Tokens.AppendDivElement(id, "inline-sidenote-meta")
if elem.Attributes["page"] != "" {
note.Tokens.AppendDivElement("", "sidenote-page")
note.Tokens.AppendText(elem.Attributes["page"])
note.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "sidenote-page")
ps.Tokens.AppendText(elem.Attributes["page"])
ps.Tokens.AppendEndElement()
}
if elem.Attributes["annotation"] != "" {
note.Tokens.AppendDivElement("", "sidenote-note")
note.Tokens.AppendText(elem.Attributes["annotation"])
note.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "sidenote-note")
ps.Tokens.AppendText(elem.Attributes["annotation"])
ps.Tokens.AppendEndElement()
}
if elem.Attributes["pos"] != "" {
note.Tokens.AppendDivElement("", "sidenote-pos")
note.Tokens.AppendText(elem.Attributes["pos"])
note.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "sidenote-pos")
ps.Tokens.AppendText(elem.Attributes["pos"])
ps.Tokens.AppendEndElement()
}
note.Tokens.AppendEndElement() // sidenote-meta
ps.Tokens.AppendEndElement()
ps.AppendNote(note)
}
case "note":
id := RandString(8)
ps.Tokens.AppendLink("#"+id, "nanchor-note")
ps.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement(id, "note", "note-note")
case "nr":
ext := elem.Attributes["extent"]
if ext == "" {
ext = "1"
}
extno, err := strconv.Atoi(ext)
if err != nil {
extno = 1
}
ps.Tokens.AppendDefaultElement(elem)
for i := 0; i < extno; i++ {
ps.Tokens.AppendText("&nbsp;")
}
case "hand":
id := RandString(8)
idno, err := strconv.Atoi(elem.Attributes["ref"])
var person *xmlmodels.PersonDef
if err == nil {
person = lib.Persons.Item(idno)
}
hand := "N/A"
if person != nil {
hand = person.Name
}
note := Note{Id: id}
note.Tokens.AppendDivElement(id, "note-hand")
note.Tokens.AppendText(hand)
note.Tokens.AppendEndElement()
ps.AppendNote(note)
ps.Tokens.AppendDivElement(id, "inline-hand")
ps.Tokens.AppendText(hand)
ps.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "hand")
ps.Tokens.AppendCustomAttribute("aria-describedby", id)
case "line":
if val := elem.Attributes["type"]; val != "empty" {
ps.LC += 1
if ps.Break {
ps.Tokens.AppendEmptyElement("br", ps.PC+"-"+strconv.Itoa(ps.LC))
}
ps.Tokens.AppendDefaultElement(elem) // This is for indents, must be closed
} else {
ps.Tokens.AppendEmptyElement("br", "", "empty")
ps.CloseElement = false // Here Indents make no sense, so we dont open an element
}
ps.LineBreak = true
case "page":
ps.PC = elem.Attributes["index"]
ps.PageBreak = true
ps.CloseElement = false
default:
if !ps.Break && elem.Type == xmlparsing.CharData && strings.TrimSpace(elem.Data) != "" {
ps.Break = true
}
if ps.PageBreak && ps.PC != "1" && elem.Type == xmlparsing.CharData && strings.TrimSpace(elem.Data) != "" {
ps.PageBreak = false
note := Note{Id: ps.PC}
quality := "outside"
if !ps.LineBreak {
quality = "inside"
}
ps.Tokens.AppendDivElement("", "eanchor-page", "eanchor-page-"+quality)
ps.Tokens.AppendCustomAttribute("aria-describedby", ps.PC)
ps.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "page-counter", "page-"+quality)
ps.Tokens.AppendText(ps.PC)
ps.Tokens.AppendEndElement()
note.Tokens.AppendDivElement(ps.PC, "page", "page-"+quality)
note.Tokens.AppendText(ps.PC)
note.Tokens.AppendEndElement()
ps.Count = append(ps.Count, note)
strings.TrimLeft(elem.Data, " \t\n\r")
}
if ps.LineBreak && elem.Type == xmlparsing.CharData && strings.TrimSpace(elem.Data) != "" {
strings.TrimLeft(elem.Data, " \t\n\r")
ps.LineBreak = false
}
ps.Tokens.AppendDefaultElement(elem)
}
}
}
return ps.String()
}

View File

@@ -9,7 +9,6 @@ import (
"github.com/Theodor-Springmann-Stiftung/lenz-web/config"
"github.com/Theodor-Springmann-Stiftung/lenz-web/controllers"
gitprovider "github.com/Theodor-Springmann-Stiftung/lenz-web/git"
"github.com/Theodor-Springmann-Stiftung/lenz-web/helpers/functions"
"github.com/Theodor-Springmann-Stiftung/lenz-web/server"
"github.com/Theodor-Springmann-Stiftung/lenz-web/templating"
"github.com/Theodor-Springmann-Stiftung/lenz-web/views"
@@ -53,7 +52,7 @@ func main() {
engine := templating.New(&views.LayoutFS, &views.RoutesFS)
engine.AddFuncs(lib.FuncMap())
engine.AddFunc("ParseGeneric", functions.TemplateParse(lib))
engine.AddFunc("ParseGeneric", xmlmodels.TemplateParse(lib))
storage := memory.New(memory.Config{
GCInterval: 24 * time.Hour,
})

View File

@@ -3,15 +3,44 @@ package xmlmodels
import (
"encoding/json"
"encoding/xml"
"github.com/Theodor-Springmann-Stiftung/lenz-web/xmlparsing"
)
type Letter struct {
XMLName xml.Name `xml:"letterText"`
Letter int `xml:"letter,attr"`
Pages []Page `xml:"page"`
Hands []RefElement `xml:"hand"`
Content string `xml:",innerxml"`
Chardata string `xml:",chardata"`
XMLName xml.Name `xml:"letterText"`
Letter int `xml:"letter,attr"`
Pages []Page `xml:"page"`
Hands []RefElement `xml:"hand"`
HTML xmlparsing.Parsed[LenzTextHandler, *LenzParseState] `xml:"-"`
}
func (l *Letter) UnmarshalXML(dec *xml.Decoder, start xml.StartElement) error {
type alias struct {
XMLName xml.Name `xml:"letterText"`
Letter int `xml:"letter,attr"`
Pages []Page `xml:"page"`
Hands []RefElement `xml:"hand"`
Inner string `xml:",innerxml"`
}
var data alias
if err := dec.DecodeElement(&data, &start); err != nil {
return err
}
l.XMLName = data.XMLName
l.Letter = data.Letter
l.Pages = data.Pages
l.Hands = data.Hands
parsed, err := parseText(Get(), data.Inner)
if err != nil {
return err
}
l.HTML = parsed
return nil
}
func (l Letter) Keys() []any {

273
xmlmodels/letter_text.go Normal file
View File

@@ -0,0 +1,273 @@
package xmlmodels
import (
"math/rand"
"strconv"
"strings"
"github.com/Theodor-Springmann-Stiftung/lenz-web/xmlparsing"
)
const charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
func randString(length int) string {
b := make([]byte, length)
for i := range b {
b[i] = charset[rand.Intn(len(charset))]
}
return string(b)
}
type Note struct {
Id string
Tokens Tokens
}
type LenzParseState struct {
Tokens Tokens
Notes []Note
Count []Note
LC int
PC string
CloseElement bool
Break bool
PageBreak bool
LineBreak bool
Lib *Library
rendered string
}
func (s *LenzParseState) String() string {
if s == nil {
return ""
}
if s.rendered != "" {
return s.rendered
}
builder := strings.Builder{}
builder.WriteString(outToken{Name: "div", Classes: []string{"count"}, Type: Element}.String())
for _, c := range s.Count {
builder.WriteString(c.Tokens.String())
}
builder.WriteString(outToken{Name: "div", Classes: []string{"count"}, Type: EndElement}.String())
tokens := s.Tokens
tokens.Prepend(outToken{Name: "div", Classes: []string{"fulltext"}, Type: Element})
tokens.AppendEndElement()
builder.WriteString(tokens.String())
builder.WriteString(outToken{Name: "div", Classes: []string{"notes"}, Type: Element}.String())
for _, note := range s.Notes {
builder.WriteString(note.Tokens.String())
}
builder.WriteString(outToken{Name: "div", Classes: []string{"notes"}, Type: EndElement}.String())
s.rendered = builder.String()
return s.rendered
}
func (s *LenzParseState) AppendNote(note Note) {
s.Notes = append(s.Notes, note)
}
type LenzTextHandler struct {
Lib *Library
}
func (h LenzTextHandler) NewState() *LenzParseState {
return &LenzParseState{
CloseElement: true,
PC: "1",
Lib: h.Lib,
}
}
func (h LenzTextHandler) OnOpenElement(state *xmlparsing.ParseState[*LenzParseState], elem *xmlparsing.Token) error {
ps := state.Data()
switch elem.Name {
case "insertion":
ps.Tokens.AppendDefaultElement(elem)
ps.Tokens.AppendDivElement("", "insertion-marker")
ps.Tokens.AppendEndElement()
case "sidenote":
id := randString(8)
ps.Tokens.AppendDefaultElement(elem)
ps.Break = false
ps.Tokens.AppendCustomAttribute("aria-describedby", id)
if elem.Attributes["annotation"] != "" ||
elem.Attributes["page"] != "" ||
elem.Attributes["pos"] != "" {
note := Note{Id: id}
note.Tokens.AppendDivElement(id, "note-sidenote-meta")
ps.Tokens.AppendDivElement(id, "inline-sidenote-meta")
if elem.Attributes["page"] != "" {
note.Tokens.AppendDivElement("", "sidenote-page")
note.Tokens.AppendText(elem.Attributes["page"])
note.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "sidenote-page")
ps.Tokens.AppendText(elem.Attributes["page"])
ps.Tokens.AppendEndElement()
}
if elem.Attributes["annotation"] != "" {
note.Tokens.AppendDivElement("", "sidenote-note")
note.Tokens.AppendText(elem.Attributes["annotation"])
note.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "sidenote-note")
ps.Tokens.AppendText(elem.Attributes["annotation"])
ps.Tokens.AppendEndElement()
}
if elem.Attributes["pos"] != "" {
note.Tokens.AppendDivElement("", "sidenote-pos")
note.Tokens.AppendText(elem.Attributes["pos"])
note.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "sidenote-pos")
ps.Tokens.AppendText(elem.Attributes["pos"])
ps.Tokens.AppendEndElement()
}
note.Tokens.AppendEndElement()
ps.Tokens.AppendEndElement()
ps.AppendNote(note)
}
case "note":
id := randString(8)
ps.Tokens.AppendLink("#"+id, "nanchor-note")
ps.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement(id, "note", "note-note")
case "nr":
ext := elem.Attributes["extent"]
if ext == "" {
ext = "1"
}
extno, err := strconv.Atoi(ext)
if err != nil {
extno = 1
}
ps.Tokens.AppendDefaultElement(elem)
for i := 0; i < extno; i++ {
ps.Tokens.AppendText("&nbsp;")
}
case "hand":
id := randString(8)
idno, err := strconv.Atoi(elem.Attributes["ref"])
var person *PersonDef
if err == nil && ps.Lib != nil {
person = ps.Lib.Persons.Item(idno)
}
hand := "N/A"
if person != nil {
hand = person.Name
}
note := Note{Id: id}
note.Tokens.AppendDivElement(id, "note-hand")
note.Tokens.AppendText(hand)
note.Tokens.AppendEndElement()
ps.AppendNote(note)
ps.Tokens.AppendDivElement(id, "inline-hand")
ps.Tokens.AppendText(hand)
ps.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "hand")
ps.Tokens.AppendCustomAttribute("aria-describedby", id)
case "line":
if val := elem.Attributes["type"]; val != "empty" {
ps.LC += 1
if ps.Break {
ps.Tokens.AppendEmptyElement("br", ps.PC+"-"+strconv.Itoa(ps.LC))
}
ps.Tokens.AppendDefaultElement(elem)
} else {
ps.Tokens.AppendEmptyElement("br", "", "empty")
ps.CloseElement = false
}
ps.LineBreak = true
case "page":
ps.PC = elem.Attributes["index"]
ps.PageBreak = true
ps.CloseElement = false
default:
ps.Tokens.AppendDefaultElement(elem)
}
return nil
}
func (h LenzTextHandler) OnCloseElement(state *xmlparsing.ParseState[*LenzParseState], elem *xmlparsing.Token) error {
ps := state.Data()
if elem.Name == "sidenote" {
ps.LineBreak = true
}
if ps.CloseElement {
ps.Tokens.AppendEndElement()
} else {
ps.CloseElement = true
}
return nil
}
func (h LenzTextHandler) OnText(state *xmlparsing.ParseState[*LenzParseState], elem *xmlparsing.Token) error {
ps := state.Data()
trimmed := strings.TrimSpace(elem.Data)
if trimmed == "" {
return nil
}
if !ps.Break {
ps.Break = true
}
if ps.PageBreak && ps.PC != "1" {
ps.PageBreak = false
note := Note{Id: ps.PC}
quality := "outside"
if !ps.LineBreak {
quality = "inside"
}
ps.Tokens.AppendDivElement("", "eanchor-page", "eanchor-page-"+quality)
ps.Tokens.AppendCustomAttribute("aria-describedby", ps.PC)
ps.Tokens.AppendEndElement()
ps.Tokens.AppendDivElement("", "page-counter", "page-"+quality)
ps.Tokens.AppendText(ps.PC)
ps.Tokens.AppendEndElement()
note.Tokens.AppendDivElement(ps.PC, "page", "page-"+quality)
note.Tokens.AppendText(ps.PC)
note.Tokens.AppendEndElement()
ps.Count = append(ps.Count, note)
}
if ps.LineBreak {
ps.LineBreak = false
}
ps.Tokens.AppendDefaultElement(elem)
return nil
}
func (h LenzTextHandler) OnComment(*xmlparsing.ParseState[*LenzParseState], *xmlparsing.Token) error {
return nil
}
func (h LenzTextHandler) Result(state *xmlparsing.ParseState[*LenzParseState]) (string, error) {
return state.Data().String(), nil
}
func parseText(lib *Library, raw string) (xmlparsing.Parsed[LenzTextHandler, *LenzParseState], error) {
handler := LenzTextHandler{Lib: lib}
parsed := xmlparsing.NewParsed[LenzTextHandler, *LenzParseState](handler)
return parsed, parsed.ParseString(raw)
}
// TemplateParse exposes the legacy helper for go templates (e.g. traditions).
func TemplateParse(lib *Library) func(letter *Meta, s string) string {
return func(_ *Meta, s string) string {
parsed, err := parseText(lib, s)
if err != nil {
return err.Error()
}
return parsed.Data().String()
}
}

View File

@@ -107,77 +107,41 @@ func (l *Library) Parse(source xmlparsing.ParseSource, baseDir, commit string) e
l.prepare()
wg.Add(1)
go func() {
err := l.Persons.Serialize(&PersonDefs{}, filepath.Join(meta.BaseDir, REFERENCES_PATH), meta)
if err != nil {
metamu.Lock()
slog.Error("Failed to serialize persons:", "error", err)
meta.FailedPaths = append(meta.FailedPaths, filepath.Join(meta.BaseDir, REFERENCES_PATH))
metamu.Unlock()
}
wg.Done()
}()
parse := func(fn func() error, path string, label string) {
wg.Add(1)
go func() {
if err := fn(); err != nil {
metamu.Lock()
slog.Error("Failed to serialize "+label+":", "error", err)
meta.FailedPaths = append(meta.FailedPaths, filepath.Join(meta.BaseDir, path))
metamu.Unlock()
}
wg.Done()
}()
}
wg.Add(1)
go func() {
err := l.Places.Serialize(&LocationDefs{}, filepath.Join(meta.BaseDir, REFERENCES_PATH), meta)
if err != nil {
metamu.Lock()
slog.Error("Failed to serialize places:", "error", err)
meta.FailedPaths = append(meta.FailedPaths, filepath.Join(meta.BaseDir, REFERENCES_PATH))
metamu.Unlock()
}
wg.Done()
}()
// References must be ready before dependent documents (hands etc.) resolve correctly.
parse(func() error {
return l.Persons.Serialize(&PersonDefs{}, filepath.Join(meta.BaseDir, REFERENCES_PATH), meta)
}, REFERENCES_PATH, "persons")
parse(func() error {
return l.Places.Serialize(&LocationDefs{}, filepath.Join(meta.BaseDir, REFERENCES_PATH), meta)
}, REFERENCES_PATH, "places")
parse(func() error {
return l.AppDefs.Serialize(&AppDefs{}, filepath.Join(meta.BaseDir, REFERENCES_PATH), meta)
}, REFERENCES_PATH, "appdefs")
wg.Wait()
wg.Add(1)
go func() {
err := l.AppDefs.Serialize(&AppDefs{}, filepath.Join(meta.BaseDir, REFERENCES_PATH), meta)
if err != nil {
metamu.Lock()
slog.Error("Failed to serialize appdefs:", "error", err)
meta.FailedPaths = append(meta.FailedPaths, filepath.Join(meta.BaseDir, REFERENCES_PATH))
metamu.Unlock()
}
wg.Done()
}()
wg.Add(1)
go func() {
err := l.Letters.Serialize(&DocumentsRoot{}, filepath.Join(meta.BaseDir, LETTERS_PATH), meta)
if err != nil {
metamu.Lock()
slog.Error("Failed to serialize letters:", "error", err)
meta.FailedPaths = append(meta.FailedPaths, filepath.Join(meta.BaseDir, LETTERS_PATH))
metamu.Unlock()
}
wg.Done()
}()
wg.Add(1)
go func() {
err := l.Traditions.Serialize(&TraditionsRoot{}, filepath.Join(meta.BaseDir, TRADITIONS_PATH), meta)
if err != nil {
metamu.Lock()
slog.Error("Failed to serialize traditions:", "error", err)
meta.FailedPaths = append(meta.FailedPaths, filepath.Join(meta.BaseDir, TRADITIONS_PATH))
metamu.Unlock()
}
wg.Done()
}()
wg.Add(1)
go func() {
err := l.Metas.Serialize(&MetaRoot{}, filepath.Join(meta.BaseDir, META_PATH), meta)
if err != nil {
metamu.Lock()
slog.Error("Failed to serialize meta:", "error", err)
meta.FailedPaths = append(meta.FailedPaths, filepath.Join(meta.BaseDir, META_PATH))
metamu.Unlock()
}
wg.Done()
}()
// Remaining documents can be parsed once references are available.
parse(func() error {
return l.Letters.Serialize(&DocumentsRoot{}, filepath.Join(meta.BaseDir, LETTERS_PATH), meta)
}, LETTERS_PATH, "letters")
parse(func() error {
return l.Traditions.Serialize(&TraditionsRoot{}, filepath.Join(meta.BaseDir, TRADITIONS_PATH), meta)
}, TRADITIONS_PATH, "traditions")
parse(func() error {
return l.Metas.Serialize(&MetaRoot{}, filepath.Join(meta.BaseDir, META_PATH), meta)
}, META_PATH, "meta")
wg.Wait()

View File

@@ -1,4 +1,4 @@
package functions
package xmlmodels
import (
"strings"
@@ -210,10 +210,6 @@ func (s *Tokens) AppendText(text string) {
})
}
func (s *Tokens) Append(token outToken) {
s.Out = append(s.Out, token)
}
func (s *Tokens) String() string {
builder := strings.Builder{}
for _, token := range s.Out {

168
xmlparsing/parsed.go Normal file
View File

@@ -0,0 +1,168 @@
package xmlparsing
import (
"iter"
"strings"
)
// ParserHandler describes the callbacks a Parsed type invokes while walking
// through the XML token stream.
type ParserHandler[S any] interface {
NewState() S
OnOpenElement(*ParseState[S], *Token) error
OnCloseElement(*ParseState[S], *Token) error
OnText(*ParseState[S], *Token) error
OnComment(*ParseState[S], *Token) error
}
// Parsed orchestrates converting raw XML into a handler-defined representation.
type Parsed[T ParserHandler[S], S any] struct {
handler T
state ParseState[S]
raw string
}
// NewParsed builds a Parsed wrapper with the provided handler.
func NewParsed[T ParserHandler[S], S any](handler T) Parsed[T, S] {
return Parsed[T, S]{handler: handler}
}
// ParseString feeds the handler with events generated from the supplied XML.
func (p *Parsed[T, S]) ParseString(xml string) error {
p.raw = xml
parser := NewParser(xml)
state := ParseState[S]{
state: p.handler.NewState(),
general: newGeneralState(parser),
}
for token, err := range parser.Iterate() {
if err != nil {
return err
}
if token == nil {
continue
}
state.general.observe(token)
switch token.Type {
case StartElement:
if err := p.handler.OnOpenElement(&state, token); err != nil {
return err
}
case EndElement:
if err := p.handler.OnCloseElement(&state, token); err != nil {
return err
}
case CharData:
// Skip empty whitespace blocks to mimic encoding/xml behaviour.
if strings.TrimSpace(token.Data) == "" {
continue
}
if err := p.handler.OnText(&state, token); err != nil {
return err
}
case Comment:
if err := p.handler.OnComment(&state, token); err != nil {
return err
}
default:
// Other token types are ignored for now.
}
}
p.state = state
return nil
}
// Raw returns the unprocessed XML.
func (p Parsed[T, S]) Raw() string {
return p.raw
}
// State exposes the accumulated ParseState.
func (p *Parsed[T, S]) State() *ParseState[S] {
return &p.state
}
// Data returns the handler-defined state value.
func (p *Parsed[T, S]) Data() S {
return p.state.state
}
// Handler exposes the handler instance for downstream consumers.
func (p *Parsed[T, S]) Handler() *T {
return &p.handler
}
// ParseState passes both handler-specific state and shared navigation helpers.
type ParseState[S any] struct {
state S
general *GeneralState
}
// Data returns the handler-owned state.
func (p *ParseState[S]) Data() S {
return p.state
}
// General exposes parser-wide helpers (tokens, peeking, etc.).
func (p *ParseState[S]) General() *GeneralState {
return p.general
}
// GeneralState tracks all past tokens and enables look-back/peek helpers.
type GeneralState struct {
tokens []*Token
parser *Parser
current *Token
}
func newGeneralState(parser *Parser) *GeneralState {
return &GeneralState{
parser: parser,
}
}
func (g *GeneralState) observe(token *Token) {
g.tokens = append(g.tokens, token)
g.current = token
}
// Tokens returns all tokens seen so far.
func (g *GeneralState) Tokens() []*Token {
return g.tokens
}
// Current returns the most recently processed token.
func (g *GeneralState) Current() *Token {
return g.current
}
// Previous returns up to n previously processed tokens (latest first).
func (g *GeneralState) Previous(n int) []*Token {
if n <= 0 || len(g.tokens) == 0 {
return nil
}
if n > len(g.tokens) {
n = len(g.tokens)
}
out := make([]*Token, 0, n)
for i := 0; i < n; i++ {
out = append(out, g.tokens[len(g.tokens)-1-i])
}
return out
}
// Peek exposes a cursor that yields upcoming tokens from the underlying parser.
func (g *GeneralState) Peek() iter.Seq2[*Token, error] {
if g.current == nil {
return func(yield func(*Token, error) bool) {
yield(nil, nil)
}
}
return g.parser.PeekFrom(g.current.Index + 1)
}