Seeking help! No matter how many keywords/names u search related to this story the article doesn't appear on Google
This was a front page A1 story I wrote for WaPo on how smear campaigns and abuse women journalists endure are a press freedom issue. Can someone explain why the article does not appear on Google? https://www.washingtonpost.com/investigations/2023/02/14/women-journalists-global-violence/
@taylorlorenz it's because there's a "noindex" tag in the head. This is telling search engines not to display the article in search results
@fromjason @taylorlorenz Also “noarchive.” It’s been specifically barred from the Internet Archive.
EDIT: This may not be a unique thing for WaPo stories, that said
@fromjason @taylorlorenz Based on the dates in the Wayback Machine, the blacklisting happened sometime before October 2023; the last archived date for the story was June 2023.
EDIT: That may be a widespread thing on the site though. I found other stories with “noarchive.”
But yeah @taylorlorenz, what @fromjason found is likely your answer. It looks like an internal block. (Possibly a legal thing? Though you probably have a better handle on that than us.)
@ernie
@fromjason @taylorlorenz
'noarchive' is quite standard for a paywalled site...
@fromjason @taylorlorenz
But Brave search finds the article. Because they're going rogue?
@ShutterbugDoug @taylorlorenz yeah. The tag is mostly a "request". Search engines can choose to honor or ignore it.
@ShutterbugDoug @fromjason @taylorlorenz
That means Brave is either caching old content or ignoring the noindex altogether, both of which are bad behaviour.
@fromjason @taylorlorenz Or they crawled it before the noindex tag was added.
@fromjason @taylorlorenz Wow, WaPo putting noindex on their past articles that are inconvenient to their owner's far-right interests??
What was this #WashingtonPost slogan again? "Democracy dies in indexes"? "Democracy dies from A to Z"? I keep forgetting it…
@fromjason @taylorlorenz
And this means Washington Post has set this page to not be indexed by search engines, even if other indexed pages link to it. It means someone at Washington Post doesn't want this page to show up in search engines. Google and Bing are just following instructions. Perhaps WaPo has a policy regarding older content, perhaps it was posted with a wrong tag, perhaps an editor accidentally marked the wrong page or folder to noindex. Don't be too quick to assume malice.
@kerfuffle @taylorlorenz I don't think anything I said assumed malice
@fromjason I didn't mean to imply that you did. Several others in the reply to the original post of @taylorlorenz did though, and your reply was the first I stumbled on that pointed at a better explanation.
@fromjason
@taylorlorenz
Correct, the article has in the HTML header an extra 'robots' meta tag containing 'noindex', blocking it from search engines. This tag is on top of the 'robots' meta tag containing 'noarchive' (and some) that all WP articles have. So some process of the WP (human or automated) added for some reason this extra noindex tag.
It's difficult to reconcile "democracy dies in darkness" with "content=noindex, content=noarchive".
@mhoye @taylorlorenz maybe the real democracy was the articles we scrubbed along the way
JK I don't know why they did this. Perhaps it was an act of God. Or maybe the index grinch.