Easy way to convert a short text between Dash HTML object representation and HTML raw string?

In different parts of my Dash app I need the same text as HTML raw string and as Dash HTML object.

For example, the text

An important undertaking

I need (in the title attribute of the Graph component) as HTML raw string:

An <b>important</b> undertaking

and (in the parameter for the Dash Summary object) as Dash object:

html.Span(['An ', html.B('important'), ' undertaking'])

Is there an easy way to convert between the two?

Thanks for any pointers to functions and methods I might have overlooked…

Note: This question stems from another plotly forum thread: How to write html in a component string without introducting line breaks

For some time I’ve been thinking of writing a function that converts jinja2 templates to Dash HTML objects.

So here is a first attempt at doing the HTML string to Dash object using the standard library module html.parser (someone should definitely redo this with beautifulsoup which is going to far more forgiving of your HTML code and less likely to have bugs):

import inspect
from html.parser import HTMLParser
import dash_html_components as html


class DashHTMLParser(HTMLParser):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._stack = []
        self.dash_object = None

    @staticmethod
    def get_dash_tag_class(tag):
        tag_title = tag.title()
        if not hasattr(html, tag_title):
            raise ValueError(f'Can not find Dash HTML tag {tag_title}')

        return getattr(html, tag_title)

    def handle_starttag(self, tag, attrs):
        dash_tag_class = self.get_dash_tag_class(tag)

        # Convert Attributes to Dash Attributes
        dash_attrs = {}
        if attrs:
            named_dash_attrs = list(inspect.signature(dash_tag_class.__init__).parameters)[1:-1]
            lower_named_dash_attrs = {n.lower(): n for n in named_dash_attrs}
            for attr_name, attr_value in attrs:
                lower_attr_name = attr_name.lower()
                if lower_attr_name == 'class':
                    dash_attrs['className'] = attr_value
                elif lower_attr_name == 'style':
                    style_dict = {}
                    for style in attr_value.split(';'):
                        style_key, style_value = style.split(':')
                        style_dict[style_key] = style_value
                    dash_attrs['style'] = style_dict
                elif lower_attr_name in ('n_clicks', 'n_clicks_timestamp'):
                    dash_attrs[lower_attr_name] = int(attr_value)
                elif lower_attr_name in lower_named_dash_attrs:
                    dash_attrs[lower_named_dash_attrs[lower_attr_name]] = attr_value
                else:
                    dash_attrs[attr_name] = attr_value
        
        # Create the real tag
        dash_tag = dash_tag_class(**dash_attrs)
        self._stack.append(dash_tag)

    def handle_endtag(self, tag):
        dash_tag_class = self.get_dash_tag_class(tag)
        dash_tag = self._stack.pop()
        if type(dash_tag) is not dash_tag_class:
            raise ValueError(f'Malformed HTML')

        # Final Tag
        if not self._stack:
            self.dash_object = dash_tag
            return

        # Set Children to always be a list
        if type(self._stack[-1].children) is not list:
            self._stack[-1].children = []

        # Append tag on to parent tag
        self._stack[-1].children.append(dash_tag)

    def handle_data(self, data):
        # Set Children to always be a list
        if type(self._stack[-1].children) is not list:
            self._stack[-1].children = []

        # Append tag on to parent tag
        self._stack[-1].children.append(data)


def html_to_dash(html_string):
    parser = DashHTMLParser()
    parser.feed(html_string)
    return parser.dash_object

You use it like so:

print(html_to_dash('<span>An <b>important</b> undertaking</span>'))

And you see the output is:

Span(['An ', B(['important']), ' undertaking'])

I’m not sure how useful you will find this code, and there are definitely bugs in it and improvements that can be made, also it is very strict about what it considers to be correct HTML (I consider this a feature). But hopefully this points you in the right direction!

Edit: Fixed a minor bug in the code

1 Like

Thanks @Damian. I will try it out.

It works! This is invaluable. Thanks again!

I noticed though is the function only works when the HTML string starts with an HTML tag. E.g.

'<span>An <b>important</b> undertaking</span>'

succeeds, while

'An <b>important</b> undertaking'

fails with a “IndexError: list index out of range”. - So for the moment I wrap the HTML string in a span tag just to be on the save side.

Yes, I wrote this in a way that only takes fully enclosed html fragments, so everything needs to be wrapped in some kind of tag, e.g:

'<span>An <b>important</b> undertaking</span>'

or:

'<div>An <b>important</b> undertaking</div>'

It just made the code a little simpler to write, and I kind of like forcing valid HTML fragments. I wish it would give you a more meaningful error message but that’s actually quite a bit of extra work to do in general.

Also I just reread the code and fixed a minor bug when using attributes in your tags, I changed:

lower_attr_name = attr_name

to:

lower_attr_name = attr_name.lower()

Which allows you to make case insensitive attributes like <b ClasS="myclass"> and it still work.