How To Parse Text To Yaml With Local LLM
How To Parse Text To Yaml With Local LLM
Yesterday I had a brilliant idea: why not parse the wiki of my favorite table top roleplaying game into yaml via an llm? I had tried the same with beautfifulsoup a couple of years ago, but the page is very inconsistent which makes it quite difficult to parse using traditional methods.
- https://dsa.ulisses-regelwiki.de/Kul_Auelfen.html
- https://dsa.ulisses-regelwiki.de/erw_zauber_sf.html?erw_zaubersf=Alchimieanalytiker
- https://dsa.ulisses-regelwiki.de/KSF_Alter_Adersin.html
However, my attempts where not very successful to parse with a local mistral model (the one you get with ollama pull mistral) as it first insisted on writing more than just the yaml code and later had troubles with more complex pages like https://dsa.ulisses-regelwiki.de/zauber.html?zauber=Abvenenum So I thought I had to give it some examples in the system prompts, but while one example helped a little, when I included more, it sometimes started to just return an example from the ones I gave to it via system prompt.
To give some idea: the bold stuff should be keys in the yaml structure, the part that follows the value. Sometimes values need to be parsed a bit more like separating pages from book names - I would give examples for all that.
Any idea what model to use for that or how to improve results?