mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

348K
active users

Terence Eden

I'm trying to create a Table of Content for for some of my WordPress blog posts.

But if I write a plugin to change [toc] to an HTML list, I get caught in an infinite loop.
The plugin is part of the content and so gets called recursively.

This seems like a knotty problem, so I'm using you all as a rubber-duck.

(No, I don't want to use someone else's plugin.)

Ah, I can get the raw content, remove the shortcode, render the markdown.

Then, hopefully, extract the document structure. Let's see if that works.

OK, using ->getElementsByTagName("h2"); I can get all the 2nd level headings.

Need to find a way to grab any sub-headings as well, but that's a start.

Getting all the h2-h6 isn't a problem. But getting them nested in order is.

I basically need to get PHP's DOMdocument to give me an outline of the page structure. Seems like the sort of thing that should be built in - but I can't find it.

All the examples I can find are based on regex (🤮).

OK, XPATH gets me part of the way there.

$xpath->query("//h2 | //h3 | //h4 | //h5 | //h6");

That gives a list of nodes in order that they appear on the page.

So now I have to do some stack thinking about whether the *next* node in the list is at a lower level than the one before it.

Well the stack works… as long as I don't have any dangling headers.

## Heading
### Subsection
### Another Sub
## Yet another heading

All works.

But this doesn't

## Heading
#### Incorrectly nested
### Should be under the heading

Becomes

1. Heading
1.1 Incorrectly nested
2. Should be under

Probably good enough for my needs.

Bugger. I'm going to have to make this recursive, aren't I?

Hey HTML and Semantic Data nerds!

What's the "best" markup for a Table of Contents?

I'm guessing a <nav> holding a <ul> with lots of <li>?

I can't find any Schema.org metadata for explicitly saying "this is a table of contents".

This seems both semantically and syntactically valid.

```html
<nav>
<menu>
<li><h2>Table of Contents</h2>
<menu>
<li><a href=#1>Equipment</a>
<li><a href=#2>Experiments</a>
<menu>
<li><a href=#3>Test A</a>
<li><a href=#4>Test B</a>
</menu>
<li><a href=#5>Conclusion</a>
</menu>
</menu>
</nav>
```

Right! Written up and scheduled.
Thanks for being brilliant rubber 🦆

@Edent Typically a placeholder should be there for the TOC until it is computed.

Alternatively/also, can you wrap the putative TOC in a tag that will get it ignored, such as <aside>?

@Edent
if( ! isset($hasGeneratedToc) ){
generateToc();
$hasGeneratedToc = true;
}

i am genius /s

@Edent 3 variations on a theme: a) set a flag when building the toc, if flag encountered skip. b) replace toc with a placeholder at start of loop and replace it and end so it doesn't run. c) have toc scan for headings but skip headings that match "contents" or have an id of toc.

@Edent give the ToC element (div?) an ID and have the plugin ignore that ID when iterating over the contents

@kevin I think you misunderstand. When get_the_content() is called, it implicitly runs the plugin code - which then calls get_the_content() again.

@Edent ah well, in that case, defer to someone who isn't talking out of their arse

@Edent @kevin A little late here and I don't know if this is useful to you, but get_the_content() doesn't run shortcodes. the_content() does.

When using get_the_content() you need to then explicitly run do_shortcode() or apply_filters() on the returned text. This means you can opt in or out of shortcodes running in particular parts of your page.

@Edent leave placeholder in the content, generate the TOC outside of it in a variable, replace placeholder with variable when done

@Edent how do other people’s plugins work? That’s usually the first place I look when thinking how to do something g in Drupal - look at the source code for similar modules

@Edent "write the theme tune, sing the theme tune”.

@slesh that's JS. I'm doing this in PHP.

@Edent I recently wrote an example on how to get a structure... let me see if I can find it.

@derickr interesting. Gives me something to go on.

@Edent You could recurse down. For each h2, call get ElementsByTagName(“h3“) and so on, (assuming your tags are all cleanly nested!). At each stage, you know where in the hierarchy you are, rather than trying to rebuild structure from the flat list the path query gives you.

@jezhiggins yeah, that's the worry I have - not everything will be so clean.

@Edent You could always make it recursive instead.

@Edent I was wondering how long before this would be an inevitable toot, tbh 😢

@Edent I *think* you can do it without recursion, with some care - though how clean that comes out may depend on what results you want in these cases, exactly...

@Edent i’ve done two nonrecursive versions of this :-)

first one i was generating markdown for the toc so i just multiplied the heading depth by 4, added that many spaces minus 6 before a * list bullet

current one is generating lists with equivalents of <ul></ul>, so i keep a nesting counter, and before each toc entry i have while depth > heading { emit </ul>; depth -= 1; } while depth < heading { emit <ul>; depth += 1 } then repeat the first loop at the end of the toc

@Edent

<pulls up lawn chair> I really want to know too.

@Edent ideally use <menu> instead of <ul>. i guess you can also add role="menu" but i believe it's overkill if you use <menu>.

@Edent surely it should be an ordered list? <ol> ?

@Edent Maybe <ol>s instead as order matters?

@Edent The semantic HTML looks very good (I notice others have been suggesting using ol - that might be better than menu) but there are a few syntactic changes I would suggest:
1. Closing the <li> tags: there are currently only opening tags.
2. Nesting lists by placing the nested lists inside their <li> elements: as per developer.mozilla.org/en-US/do. I have got this wrong quite a lot and had to look it up!

The MDN Web Docs logo, featuring a blue accent color, displayed on a solid black background.
MDN Web Docs<ul>: The Unordered List element - HTML: HyperText Markup Language | MDNThe <ul> HTML element represents an unordered list of items, typically rendered as a bulleted list.

@Edent With the syntactical corrections:

```html
<nav>
   <menu>
      <li><h2>Table of Contents</h2></li>
      <menu>
         <li><a href=#1>Equipment</a></li>
         <li>
<a href=#2>Experiments</a>
          <menu>
            <li><a href=#3>Test A</a></li>
            <li><a href=#4>Test B</a></li>
          </menu>
</li>
         <li><a href=#5>Conclusion</a></li>
     </menu>
   </menu>
</nav>
```

@WebCoder49 I'm not sure that's correct. The <li> elements implicit close.
html.spec.whatwg.org/multipage
The validator doesn't complain about it.

html.spec.whatwg.orgHTML Standard

@Edent Thanks for giving me that knowledge - I didn't know about that! It definitely makes your code look a lot cleaner than my verbose code.

@WebCoder49 yeah, it's one of those weird little things which hark back to the original spec. The <p> element is the same.

@Edent @WebCoder49 It's odd how I used to use <li> and <p> without end tags back in HTML 2, but since the years when XHTML was the thing to do I haven't been able to return to that. It always looks wrong to me. I'm scarred forever.

@Edent @WebCoder49 Interesting that that validates. My reading of the doc is that it is only valid to omit the closing tag if the next tag was another <li> or the closing tag of the parent element. So putting links in would require the closing tag. But if it validates it’s valid. 🤷🏻‍♂️

@Edent this feels like something where wordpress should expose an API to make this easier? You can't be the first person to want to do something like this...

@Edent Here is an extremely old JavaScript that does what you want. It uses sourceIndex (a property that I forgot exists) to order the headings.

quirksmode.org/dom/getElements

This script was considered a paragon of modern JavaScripting in 2004 or thereabout, when I wrote it.

quirksmode.orgDOM extension - getElementsByTagNames

@Edent Oh no, it's not sourceIndex (which was IE only) but compareDocumentPosition, another method I forgot exists.

@Edent if the internet was a good rubber-duck for you, will you be returning the favour by writing up you problem and solution?

I'm trying to get back into Wordpress stuff after ignoring it for about 10 years. Expect I might hit some similar frustration soon (so many years of $software make it hard to find good/correct help about $software).

@gregorymarler me? Write a blog post? Well, there's a first time for everything 😆