Making Dash SEO-friendly

I’ve created a simple package that converts and includes an app’s layout to an HTML string (included in the app.index_string)

The approach is simple:

$ pip install dashseo

from dashseo import htmlify
from dash import Dash, html

app = Dash()
app.layout = html.Div([
    html.H1("Hello, world!")
])

# You only need to add this:
app.index_string = htmlify(app.layout)

app.run()

One issue: This duplicates the content of the page. The current hacky workaround is to wrap the second copy inside a hidden div (but that’s also no good for search engines).

Looking for better ways to hide the duplicate content (maybe through a clientside callback?) Would love any suggestions on how to solve this.

Thanks

2 Likes

Hello @eliasdabbas,

What is the end goal and what data do you want the SEO to be able to scrub?

Can you use meta tags?

Thanks @jinnyzor

The idea is to have Dash apps readable, indexable, and findable by search engines.
An important example is the Dash docs website. It is built with Dash and was “empty” for search engines, even though it contained massive useful content. Then this approach was implemented a few years ago, and people can now search for content on the docs website.

Here’s another example containing ~200 URLs with dynamic content for each country. But for search engines, this is essentially nothing. Google has a bunch of “Loading…” pages.

Meta tags are good but for search engines to understand the pages’ content, they need to read the visible text on the pages.

Hope that makes sense. Happy to elaborate/clarify more.
Thanks.

1 Like

While your approach works, on a simple layout, do you have an example of a multi page app and using this?

If you look here, you can see a little bit about how the header will look in the request:

My thinking is that for each page in the directory, you could create a navigation if the request header has googlebot or the like in it.

At this point, you could switch your default page to be just an index of sites you want the bot to crawl, and with a description on the home page of what they would find.

Then for each other page, you do the same thing, except you wouldn’t need to return the index of all other pages. You could do this with your script where it creates a duplicate of the data that you want to render, but instead of putting it in the index of the application, place it as a route for the flask server.

Again, this is just my initial take on it, and making this happen will be interesting for sure. :grin:

Thanks again @jinnyzor. Appreciate the tips.

I found a new better approach, without any duplication.

It basically inserts the converted HTML string into the dash.dash._app_entry attribute, which becomes part of the normal page. No duplication, nothing hacky. The page source would now look like this:

Screenshot 2024-01-18 at 12.08.36 PM

It still doesn’t work with Pages yet. I’m exploring different options.

What version of dash are you using?

@AnnMarieW might have some thoughts on this actually.

I’m using the latest 2.14.2.

I remember @AnnMarieW was heavily involved with Pages, and I’m sure we can have great ideas on how best to implement this with multi page apps.

Just for reference, most of the work here depends on the convert_to_html function that has already been in use to convert Dash docs pages to HTML.

Package repo: https://github.com/eliasdabbas/dashseo

Thanks!

I’m certainly no SEO expert, but it seems like the search engines find Dash sites. Take this site for example:

Here are 2 searches where it shows up #1 on Google.

The first one is even ahead of the official Bootstrap docs :laughing: That’s probably because it has search terms from the meta tag description – which is very easy to update when you use Pages.

The second search is even finding some random text on a page



Great achievements! :slight_smile:

Yes, of course Google can find those pages.

It’s just extremely limited what they can understand about them. These are very specific terms with little competition. It would be really difficult to rank for things like “hotels in new york”, or “build a website” with pages that only have meta descriptions.

When Google can read a full article, it can understand much more deeply what the page is about, what topics it covers, and how to evaluate it.
Links are another critical factor in determining the meaning of page (what other pages link to/from it), and how credible it is (does it get linked to from reputable websites?)

More than happy to elaborate more if needed :slight_smile:

The current approach I’m using is converting the layout of an app to an HTML string and inserting it into _app_entry.

What do you think is the best approach to insert this dynamically in a multi page app. When a user navigates to/page_1 display it’s layout (converted to HTML)?

Thanks

1 Like

I’m still struggling to understand the issue. Sure, there is not much competition for the search terms I entered, but the second search result was from content deep within one of the pages and the answer was highlighted as a “featured snippet”. That content was not in the meta descriptions. So I’m not sure it’s necessary to put an html string in _app_entry.

But regardless, I’m not sure how you would do this - maybe some type of clientside callback that listens to the dcc.Location component? I wish I could be more helpful.

I’m not sure you should muddy up the page with actual data, you can use the header in the request to cater a different page when being scrubbed with search engines.

They could potentially have implications with things like screen readers and other accessibility problems.

Indeed, in the second example it uses text from the page, and not from the meta description.

Your suggested solution with a clientside callback listening to dcc.Location sounds interesting.

Let me double check everything, and I’ll let you know if I have a question.

Thanks again!

1 Like

Hey @eliasdabbas,

Where did you land with this? I’m very interested. Our site https://myijack.com has our SEO-focused product marketing pages server-side-rendered with Flask and Jinja, and our interactive data-focused pages live in Dash behind a login, so they’re not crawled by search engines.

I prefer the Dash experience for interactivity, so I’d love to bring the whole site under the Dash umbrella if we could find a way to server-side-render and freeze our “marketing” pages.

Cheers!

I created a templates/index.html that looks like:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <meta name="description" content="Your app description here">
        <meta name="keywords" content="keyword1, keyword2, keyword3">
        <title>{%title%}</title>
        {%favicon%}
        {%css%}
        {%metas%}
        
        <!-- Open Graph / Facebook -->
        <meta property="og:type" content="website">
        <meta property="og:url" content="https://yourwebsite.com/">
        <meta property="og:title" content="Your App Title">
        <meta property="og:description" content="Your app description here">
        <meta property="og:image" content="https://yourwebsite.com/path/to/image.jpg">

        <!-- Twitter -->
        <meta property="twitter:card" content="summary_large_image">
        <meta property="twitter:url" content="https://yourwebsite.com/">
        <meta property="twitter:title" content="Your App Title">
        <meta property="twitter:description" content="Your app description here">
        <meta property="twitter:image" content="https://yourwebsite.com/path/to/image.jpg">

        <!-- Google tag (gtag.js) -->
        <script async src="https://www.googletagmanager.com/gtag/js?id=G-6WYY9JHMP2"></script>
        <script>
          window.dataLayer = window.dataLayer || [];
          function gtag(){dataLayer.push(arguments);}
          gtag('js', new Date());

          gtag('config', 'G-6WYY9JHMP2');
        </script>

        <!-- Structured Data -->
        <script type="application/ld+json">
        {
          "@context": "https://schema.org",
          "@type": "WebApplication",
          "name": "Your App Name",
          "description": "Your app description here",
          "url": "https://yourwebsite.com",
          "applicationCategory": "Your App Category",
          "operatingSystem": "Web"
        }
        </script>
    </head>
    <body>
        <main>
            {%app_entry%}
        </main>
        <footer>
            {%config%}
            {%scripts%}
            {%renderer%}
            <p>&copy; 2024 Your Company Name. All rights reserved.</p>
        </footer>
    </body>
</html>

and this has worked well for my project. I’ve built on top of dmc docs which is a genius code base imo, but it turns .md into pages with a markdown.py of:

import logging
from pathlib import Path
from typing import Optional

import dash
import dash_mantine_components as dmc
import frontmatter
from markdown2dash import Admonition, BlockExec, Divider, Image, create_parser
from pydantic import BaseModel

from lib.constants import PAGE_TITLE_PREFIX
from lib.directives.kwargs import Kwargs
from lib.directives.source import SC
from lib.directives.toc import TOC

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

directory = "docs"

# read all markdown files
files = Path(directory).glob("**/*.md")


class Meta(BaseModel):
    name: str
    description: str
    endpoint: str
    package: str = "dash_pydantic_form"
    category: Optional[str] = None
    icon: Optional[str] = None


def make_endpoint(name):
    return "-".join(name.lower().split())


directives = [Admonition(), BlockExec(), Divider(), Image(), Kwargs(), SC(), TOC()]
parse = create_parser(directives)

for file in files:
    logger.info("Loading %s..", file)
    metadata, content = frontmatter.parse(file.read_text())
    metadata = Meta(**metadata)
    logger.info("Type of content: %s", type(content))

    layout = parse(content)

    # add heading and description to the layout
    section = [
        dmc.Title(metadata.name, order=2, className="m2d-heading"),
        dmc.Text(metadata.description, className="m2d-paragraph"),
    ]
    layout = section + layout

    # register with dash
    dash.register_page(
        metadata.name,
        metadata.endpoint,
        name=metadata.name,
        title=PAGE_TITLE_PREFIX + metadata.name,
        description=metadata.description,
        layout=layout,
        category=metadata.category,
        icon=metadata.icon,
    )

so when I create new folders in the docs/folder/____.md I structure it like so:

---
name: Full Calendar
description: Use Full Calendar Component to create a dash calendar.
endpoint: /pip/full_calendar_component
package: full_calendar_component
icon: line-md:calendar
---

and the markdown.py will do all the magic of setting up the description for each page, creating the url and changing the title name which translates directly into better SEO if you look up individual pages on google.

1 Like

Note - the dmc docs uses Pages to set the meta tags :slight_smile:

1 Like

@seanrez I think you are right, Dash is perfect for interactivity and Flask is perfect for the backend, allowing full flexibility in managing everything else.

I ended up integrating Dash and Flask:

  • A main Flask app is the “website”
  • A bunch of Dash apps are initialized with server=False and then integrated as blueprints into the main app
  • Using app.index() each app’s body will be included wherever you want in the main app, as if you are embedding content into a div
  • The app.index_string is minimized using only its essential components, so you don’t end up with two HTML pages within one another
  • No iframes or anything hacky
  • No compromise on Flask, and no forcing Dash into doing stuff it is not meant to do
  • I have this in this repo showing in four phases how you can achieve the full integration
  • A live implementation example can be found here https://adver.tools/

Would love to know what you think.

1 Like