Mastodon @Mastodon

I'm trying to create a Table of Content for for some of my WordPress blog posts.

But if I write a plugin to change [toc] to an HTML list, I get caught in an infinite loop.
The plugin is part of the content and so gets called recursively.

This seems like a knotty problem, so I'm using you all as a rubber-duck.

(No, I don't want to use someone else's plugin.)

#WordPress #MarkDown

Mar 24, 2025, 10:37 AM··Web

3boosts·4favorites

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

Ah, I can get the raw content, remove the shortcode, render the markdown.

Then, hopefully, extract the document structure. Let's see if that works.

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

OK, using ->getElementsByTagName("h2"); I can get all the 2nd level headings.

Need to find a way to grab any sub-headings as well, but that's a start.

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

Getting all the h2-h6 isn't a problem. But getting them nested in order is.

I basically need to get PHP's DOMdocument to give me an outline of the page structure. Seems like the sort of thing that should be built in - but I can't find it.

All the examples I can find are based on regex ().

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

OK, XPATH gets me part of the way there.

$xpath->query("//h2 | //h3 | //h4 | //h5 | //h6");

That gives a list of nodes in order that they appear on the page.

So now I have to do some stack thinking about whether the *next* node in the list is at a lower level than the one before it.

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

Well the stack works… as long as I don't have any dangling headers.

## Heading
### Subsection
### Another Sub
## Yet another heading

All works.

But this doesn't

## Heading
#### Incorrectly nested
### Should be under the heading

Becomes

1. Heading
1.1 Incorrectly nested
2. Should be under

Probably good enough for my needs.

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

Bugger. I'm going to have to make this recursive, aren't I?

**Terence Eden** @Edent · Mar 25

Mar 25

Terence Eden @Edent

Hey HTML and Semantic Data nerds!

What's the "best" markup for a Table of Contents?

I'm guessing a <nav> holding a <ul> with lots of <li>?

I can't find any Schema.org metadata for explicitly saying "this is a table of contents".

#WebDev #SemanticWeb #HTML

**Terence Eden** @Edent · Mar 25

Mar 25

Terence Eden @Edent

This seems both semantically and syntactically valid.

```html
<nav>
<menu>
<li><h2>Table of Contents</h2>
<menu>
<li><a href=#1>Equipment</a>
<li><a href=#2>Experiments</a>
<menu>
<li><a href=#3>Test A</a>
<li><a href=#4>Test B</a>
</menu>
<li><a href=#5>Conclusion</a>
</menu>
</menu>
</nav>
```

**Terence Eden** @Edent · Mar 25

Mar 25

Terence Eden @Edent

Right! Written up and scheduled.
Thanks for being brilliant rubber

A table of contents describing how to create a table of contents.

**DamonHD** @DamonHD · Mar 24

Mar 24

DamonHD @DamonHD

@Edent Typically a placeholder should be there for the TOC until it is computed.

Alternatively/also, can you wrap the putative TOC in a tag that will get it ignored, such as <aside>?

**Benjamin** @BenjaminNelan · Mar 24

Mar 24

Benjamin @BenjaminNelan

@Edent
if( ! isset($hasGeneratedToc) ){
generateToc();
$hasGeneratedToc = true;
}

i am genius /s

**Richard Bairwell** @rbairwell@mastodon.org.uk · Mar 24

Mar 24

Richard Bairwell @rbairwell@mastodon.org.uk

@Edent 3 variations on a theme: a) set a flag when building the toc, if flag encountered skip. b) replace toc with a placeholder at start of loop and replace it and end so it doesn't run. c) have toc scan for headings but skip headings that match "contents" or have an id of toc.

**Kevin** @kevin@fedi.kevinisageek.org · Mar 24

Mar 24

Kevin @kevin@fedi.kevinisageek.org

@Edent give the ToC element (div?) an ID and have the plugin ignore that ID when iterating over the contents

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

@kevin I think you misunderstand. When get_the_content() is called, it implicitly runs the plugin code - which then calls get_the_content() again.

**Kevin** @kevin@fedi.kevinisageek.org · Mar 24

Mar 24

Kevin @kevin@fedi.kevinisageek.org

@Edent ah well, in that case, defer to someone who isn't talking out of their arse

**Tom Walker** @tomw · Mar 25 *

Mar 25 *

Tom Walker @tomw

@Edent @kevin A little late here and I don't know if this is useful to you, but get_the_content() doesn't run shortcodes. the_content() does.

When using get_the_content() you need to then explicitly run do_shortcode() or apply_filters() on the returned text. This means you can opt in or out of shortcodes running in particular parts of your page.

**tante** @tante@tldr.nettime.org · Mar 24

Mar 24

tante @tante@tldr.nettime.org

@Edent leave placeholder in the content, generate the TOC outside of it in a variable, replace placeholder with variable when done

**Rachel Lawson** @rachel@norfolk.social · Mar 24

Mar 24

Rachel Lawson @rachel@norfolk.social

@Edent how do other people’s plugins work? That’s usually the first place I look when thinking how to do something g in Drupal - look at the source code for similar modules

**Daniel Durrans** @dan@durrans.com · Mar 24

Mar 24

Daniel Durrans @dan@durrans.com

@Edent "write the theme tune, sing the theme tune”.

**Stewart Haines** @slesh@theblower.au · Mar 24 *

Mar 24 *

Stewart Haines @slesh@theblower.au

@Edent@mastodon.social@mastodon.social

querySelectorAll("h2, h3, h4") might be useful

https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll

MDN Web DocsDocument: querySelectorAll() method - Web APIs | MDNThe Document method querySelectorAll() returns a static (not live) NodeList representing a list of the document's elements that match the specified group of selectors.

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

@slesh that's JS. I'm doing this in PHP.

**Derick Rethans** @derickr@phpc.social · Mar 24

Mar 24

Derick Rethans @derickr@phpc.social

@Edent I recently wrote an example on how to get a structure... let me see if I can find it.

**Derick Rethans** @derickr@phpc.social · Mar 24

Mar 24

Derick Rethans @derickr@phpc.social

@Edent https://www.php.net/manual/en/example.xml-structure.php — although that's element structures, not document structure

www.php.netPHP: XML Element Structure Example - ManualPHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

@derickr interesting. Gives me something to go on.

**Jez Higgins** @jezhiggins@mastodon.me.uk · Mar 24

Mar 24

Jez Higgins @jezhiggins@mastodon.me.uk

@Edent You could recurse down. For each h2, call get ElementsByTagName(“h3“) and so on, (assuming your tags are all cleanly nested!). At each stage, you know where in the hierarchy you are, rather than trying to rebuild structure from the flat list the path query gives you.

**Terence Eden** @Edent · Mar 24

Mar 24

Terence Eden @Edent

@jezhiggins yeah, that's the worry I have - not everything will be so clean.

**Hugo Mills** @darkling@mstdn.social · Mar 24

Mar 24

Hugo Mills @darkling@mstdn.social

@Edent You could always make it recursive instead.

**al3xsh** @al3xsh@sigmoid.social · Mar 24

Mar 24

al3xsh @al3xsh@sigmoid.social

@darkling @Edent

**Andreas Gohr** @splitbrain@fedi.splitbrain.org · Mar 24

Mar 24

Andreas Gohr @splitbrain@fedi.splitbrain.org

@darkling @Edent alternatively look here: https://mastodon.social/@Edent/114218302810183175

**Owen Blacker** @owenblacker@dataare.cool · Mar 24

Mar 24

Owen Blacker @owenblacker@dataare.cool

@Edent I was wondering how long before this would be an inevitable toot, tbh

**mal3aby** @mal3aby@mastodon.smears.org · Mar 24

Mar 24

mal3aby @mal3aby@mastodon.smears.org

@Edent I *think* you can do it without recursion, with some care - though how clean that comes out may depend on what results you want in these cases, exactly...

**Tony Finch** @fanf@mendeddrum.org · Mar 24 *

Mar 24 *

Tony Finch @fanf@mendeddrum.org

@Edent i’ve done two nonrecursive versions of this :-)

first one i was generating markdown for the toc so i just multiplied the heading depth by 4, added that many spaces minus 6 before a * list bullet

current one is generating lists with equivalents of <ul></ul>, so i keep a nesting counter, and before each toc entry i have while depth > heading { emit </ul>; depth -= 1; } while depth < heading { emit <ul>; depth += 1 } then repeat the first loop at the end of the toc

**Third spruce tree on the left** @tezoatlipoca@mas.to · Mar 25

Mar 25

Third spruce tree on the left @tezoatlipoca@mas.to

@Edent

<pulls up lawn chair> I really want to know too.

**david turgeon** @dt@mastodon.top · Mar 25

Mar 25

david turgeon @dt@mastodon.top

@Edent ideally use <menu> instead of <ul>. i guess you can also add role="menu" but i believe it's overkill if you use <menu>.

**Terence Eden** @Edent · Mar 25

Mar 25

Terence Eden @Edent

@dt good shout!

**Caz Mockett** @cazmockett · Mar 25

Mar 25

Caz Mockett @cazmockett

@Edent surely it should be an ordered list? <ol> ?

**Matt Round** @mattround@crispsandwi.ch · Mar 25

Mar 25

Matt Round @mattround@crispsandwi.ch

@Edent Maybe <ol>s instead as order matters?

**Leigh Dodds** @ldodds@mastodon.me.uk · Mar 25

Mar 25

Leigh Dodds @ldodds@mastodon.me.uk

@Edent Dublin Core has a tableOfContents property https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/tableOfContents/ if you wanted to flag it in some way.

DCMI · Jan 20, 2020Table Of ContentsA list of subunits of the resource.

**Oliver Geer** @WebCoder49@floss.social · Mar 25

Mar 25

Oliver Geer @WebCoder49@floss.social

@Edent The semantic HTML looks very good (I notice others have been suggesting using ol - that might be better than menu) but there are a few syntactic changes I would suggest:
1. Closing the <li> tags: there are currently only opening tags.
2. Nesting lists by placing the nested lists inside their <li> elements: as per https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ul#nesting_a_list. I have got this wrong quite a lot and had to look it up!

MDN Web Docs<ul>: The Unordered List element - HTML: HyperText Markup Language | MDNThe <ul> HTML element represents an unordered list of items, typically rendered as a bulleted list.

**Oliver Geer** @WebCoder49@floss.social · Mar 25

Mar 25

Oliver Geer @WebCoder49@floss.social

@Edent With the syntactical corrections:

```html
<nav>
   <menu>
      <li><h2>Table of Contents</h2></li>
      <menu>
         <li><a href=#1>Equipment</a></li>
         <li>
<a href=#2>Experiments</a>
          <menu>
            <li><a href=#3>Test A</a></li>
            <li><a href=#4>Test B</a></li>
          </menu>
</li>
         <li><a href=#5>Conclusion</a></li>
     </menu>
   </menu>
</nav>
```

**Terence Eden** @Edent · Mar 25

Mar 25

Terence Eden @Edent

@WebCoder49 I'm not sure that's correct. The <li> elements implicit close.
https://html.spec.whatwg.org/multipage/grouping-content.html#the-li-element
The validator doesn't complain about it.

html.spec.whatwg.orgHTML Standard

**Oliver Geer** @WebCoder49@floss.social · Mar 25

Mar 25

Oliver Geer @WebCoder49@floss.social

@Edent Thanks for giving me that knowledge - I didn't know about that! It definitely makes your code look a lot cleaner than my verbose code.

**Terence Eden** @Edent · Mar 25

Mar 25

Terence Eden @Edent

@WebCoder49 yeah, it's one of those weird little things which hark back to the original spec. The <p> element is the same.

**Phil Gyford** @philgyford · Mar 25

Mar 25

Phil Gyford @philgyford

@Edent @WebCoder49 It's odd how I used to use <li> and <p> without end tags back in HTML 2, but since the years when XHTML was the thing to do I haven't been able to return to that. It always looks wrong to me. I'm scarred forever.

**Kerr** @kerr@mastodon.scot · Mar 25

Mar 25

Kerr @kerr@mastodon.scot

@Edent @WebCoder49 Interesting that that validates. My reading of the doc is that it is only valid to omit the closing tag if the next tag was another <li> or the closing tag of the parent element. So putting links in would require the closing tag. But if it validates it’s valid.

**scmbradley** @Scmbradley@mathstodon.xyz · Mar 24

Mar 24

scmbradley @Scmbradley@mathstodon.xyz

@Edent this feels like something where wordpress should expose an API to make this easier? You can't be the first person to want to do something like this...

**ppk** @ppk@front-end.social · Mar 24

Mar 24

ppk @ppk@front-end.social

@Edent Here is an extremely old JavaScript that does what you want. It uses sourceIndex (a property that I forgot exists) to order the headings.

https://quirksmode.org/dom/getElementsByTagNames.html

This script was considered a paragon of modern JavaScripting in 2004 or thereabout, when I wrote it.

quirksmode.orgDOM extension - getElementsByTagNames

**ppk** @ppk@front-end.social · Mar 24

Mar 24

ppk @ppk@front-end.social

@Edent Oh no, it's not sourceIndex (which was IE only) but compareDocumentPosition, another method I forgot exists.

**Gregory Marler** @gregorymarler@en.osm.town · Mar 25

Mar 25

Gregory Marler @gregorymarler@en.osm.town

@Edent if the internet was a good rubber-duck for you, will you be returning the favour by writing up you problem and solution?

I'm trying to get back into Wordpress stuff after ignoring it for about 10 years. Expect I might hit some similar frustration soon (so many years of $software make it hard to find good/correct help about $software).

**Terence Eden** @Edent · Mar 25

Mar 25

Terence Eden @Edent

@gregorymarler me? Write a blog post? Well, there's a first time for everything

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back