How to highlight (set different colors for) specific words in long text? Tags do not work

numam · December 5, 2022, 9:08am

Hello everyone!

I have been trying to highlight specific words in long text but I have not succeeded. Is there a method for doing this in DASH?
I thought that I could add tags around the words I want to highlight and then use css to create a new class to assign any chosen color. For instance, if I wanted to highlight the words inflated and surf_thickness using a red font, I would use tags in the following way:

Executing commands from fv_snaps_<tag>surf_thickness</tag>_disptmpl_<tag>inflated</tag>_thresh_1.0-3.5_lh.txt

However, this does not work. Any suggestions about how to achieve this?

Thank you for your help!

jinnyzor · December 5, 2022, 10:31am

Hello @numam,

In order to change colors in a string, you’ll need to use span to wrap the text you want to change.

edit: Rofl spam… gotta love autocorrect…

numam · December 5, 2022, 6:43pm

Thank you so much for your answer @jinnyzor!!! It did the trick. I owe you another one! You are the best! thank you so much!!!

I did it using a very small snipped of all the data I need to highlight which has several thousands of text (I am highlighting error logs of MRI data I am processing). When I try to apply the code to all the data, I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/cl/sd_8grvj2b1d1pk5cb96kdp00000gn/T/ipykernel_2625/1762203074.py in <module>
      4 app.layout = dbc.Container([
      5     html.Div([
----> 6        eval(allHighlightedTXT)
      7     ], style={"color":"white", 'font-size': '12pt'})
      8 ])

ValueError: source code string cannot contain null bytes

Do you know how I can fix this? When I check the text (printing it in Jupyter Notebook) Everything looks correct. I am thinking that the error is related to what dictates white spaces etc in the text and I cannot see those things using Jupyter. If you have or anyone has seen this before and know how to solve this, please let me know

Here is the sample code for how I did it. in case it benefits others. Maybe there is a better way but this is how I applied your suggestion:

# NEGATIVE LOGS HIGHLIGHTING 
"""
First layer wraps keywords with '' and second layer wrapps keywords with the html.Span component
The Span components are categorized using className (positiveLogs and negativeLogs)
Use a CSS file in the assets folder for creating the classes positiveLogs and negativeLogs so you can chose the color and text
effects for each set of keywords. 
"""
textToBeHighlighted = "Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txtEExecuting commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt "

# NEGATIVE LOGS HIGHLIGHTING 
firstLayerKeywords = ["txt", "error", "failed", "cant't", "doesn't","could not open", "not reading","unable", "not found","mgz", "mgh"]
nfirstLayerPos = find_MultipleKeywordsInLog(firstLayerKeywords, textToBeHighlighted)
nfirstLayerHighlights= highlightKeywords(nfirstLayerPos, textToBeHighlighted, start_highlight="''", end_highlight="''")

secondLayerKeywords = ["'txt'", "'error'", "'failed'", "'cant't'", "'doesn't'","'could not open'", "'not reading'","'unable'", "'not found'","'mgz'", "'mgh'"]
nSecLayerPos = find_MultipleKeywordsInLog(secondLayerKeywords, nfirstLayerHighlights)
nSecLayerHighlights = highlightKeywords(nSecLayerPos, nfirstLayerHighlights, start_highlight=",html.Span(", end_highlight=",className= 'negativeLogs effect-shine'),")

# POSITIVE LOGS HIGHLIGHTING 
PositiveFirstLayerKeywords = ["snap", "disptmpl"]
pfirstLayerPos = find_MultipleKeywordsInLog(PositiveFirstLayerKeywords, nSecLayerHighlights)
pfirstLayerHighlights= highlightKeywords(pfirstLayerPos ,nSecLayerHighlights, start_highlight="''", end_highlight="''")

PositiveSecLayerKeywords = ["'snap'", "'disptmpl'"]
pSecLayerPos = find_MultipleKeywordsInLog(PositiveSecLayerKeywords, pfirstLayerHighlights)
pSecLayerHighlights = highlightKeywords(pSecLayerPos, pfirstLayerHighlights, start_highlight=",html.Span(", end_highlight=",className= 'positiveLogs effect-shine'),")

allHighlightedTXT = ''.join(("html.Div(['",pSecLayerHighlights,"'])"))

app = JupyterDash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.FONT_AWESOME], external_scripts=[
        {'src': 'assets/spanText.js'}])

app.layout = dbc.Container([
    html.Div([
       eval(allHighlightedTXT)
    ], style={"color":"white", 'font-size': '12pt'})
])

app.run_server(mode ="external", debug=True)

## FUNCTIONS USED ABOVE OFR FINDING AND HIGHLIGHTING THE KEYWORDS ( I COULD NOT CREATE ATTACHMENTS TO THIS POST FOR THEM BUT HERE THEY ARE)

import regex as re 

def find_MultipleKeywordsInLog(keywords, text):
    # GET ALL THE INDXS FOR ALL THE KEYWORDS AND PLACE THEM IN LIST AS PAIRS
    kwPositions = []
    for kw in keywords:
        matches = re.finditer(kw, text, flags=re.I)
        for match in matches:
            start, end = match.span()
            kwPositions += [[start, end]]
    # SORT THE LIST IN ASCENDING ORDER
    kwPositions.sort(key=lambda y: y[0])
    
    # PUT THE INDICES IN ONE LIST 
    keywordIndxs = []
    for i in range(0,len(kwPositions)):
        keywordIndxs += kwPositions[i][0], kwPositions[i][1]
    return keywordIndxs


def highlightKeywords(kwPositions, text, start_highlight="<mark>", end_highlight="</mark>"):
    entireTextWithhighlights = ""
    for i, (start, end) in enumerate(zip([None] + kwPositions, kwPositions + [None])):
        if i % 2:  # odd segments are highlighted
            entireTextWithhighlights += start_highlight + text[start:end] + end_highlight
        else:      # even segments are not
            entireTextWithhighlights += text[start:end]
    return entireTextWithhighlights

jinnyzor · December 5, 2022, 6:51pm

Typically, you’ll want to stay away from “eval” as it execute as python code…

Say I pass a string of text:

import shutil
shutil.rmtree()

This would delete your files.

You are better off building the function to take the string of text and replace your stings.

numam · December 5, 2022, 7:12pm

Thank you for letting me know It is my first time dealing with having constructed a string that I want to execute. I did not know that using eval() is not good practice. I will find a different solution.

jinnyzor · December 5, 2022, 7:31pm

Give this a try and see if it works for you.


# NEGATIVE LOGS HIGHLIGHTING
"""
First layer wraps keywords with '' and second layer wrapps keywords with the html.Span component
The Span components are categorized using className (positiveLogs and negativeLogs)
Use a CSS file in the assets folder for creating the classes positiveLogs and negativeLogs so you can chose the color and text
effects for each set of keywords. 
"""
textToBeHighlighted = ["Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txtEExecuting commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt "]

negativeWords = ["txt", "error", "failed", "cant't", "doesn't", "could not open", "not reading", "unable",
                      "not found", "mgz", "mgh"]
positiveWords = ["snap", "disptmpl"]


for i in negativeWords:
    newArray = []
    for x in textToBeHighlighted:
        if isinstance(x, str):
            for y in range(len(x.split(i))):
                newArray.append(x.split(i)[y])
                if y != len(x.split(i))-1:
                    newArray.append(html.Span(i, className='negativeLogs effect-shine'))
        else:
            newArray.append(x)

    textToBeHighlighted = newArray

for i in positiveWords:
    newArray = []
    for x in textToBeHighlighted:
        if isinstance(x, str):
            for y in range(len(x.split(i))):
                newArray.append(x.split(i)[y])
                if y != len(x.split(i))-1:
                    newArray.append(html.Span(i, className='positiveLogs effect-shine'))
        else:
            newArray.append(x)

    textToBeHighlighted = newArray

app = JupyterDash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.FONT_AWESOME], external_scripts=[
    {'src': 'assets/spanText.js'}])

app.layout = dbc.Container([
    html.Div(textToBeHighlighted, style={"color": "white", 'font-size': '12pt'})
])

app.run_server(mode="external", debug=True)

numam · December 5, 2022, 8:27pm

@jinnyzor You just blew my mind away Such a succinct and elegant solution. You are my hero. I did not have a clue that I could do that with split!!! i always used it to split strings based on symbols and not characters It worked!
But when I ran the one of the log textfiles through the code, it broke down. I got an error: “Maximum call stack size exceeded” and the array does not look right. here is a snipped.

['2', '0', '2', '2', '-', '1', '2', '-', '0', '3', ' ', '0', '1', ':', '4', '3', ':', '0', '7', ':', ' ', 'E', 'x', 't', 'r', 'a', 'c', 't', 'i', 'o', 'n', ' ', 'o', 'f', ' ', 'S', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', ' ', 'S', 't', 'a', 'r', 't', 'e', 'd', ' ', '.', '.', '.', '\n', 'C', 'h', 'e', 'c', 'k', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 's', 'u', 'b', 'j', 'e', 'c', 't', ' ', 'l', 'i', 's', 't', '.', '.', '.', '\n', 's', 'u', 'b', '-', 'D', 'Z', '1', '7', 'B', 'H', '-', 's', 'e', 's', '-', '1', '_', 'r', 'u', 'n', 'l', 'i', 's', 't', '4', 'Q', 'C', '.', 't', 'x', 't', ' ', 'S', 'u', 'b', 'j', 'e', 'c', 't', ' ', 'L', 'i', 's', 't', ' ', 's', 'u', 'c', 'c', 'e', 's', 'f', 'u', 'l', 'l', 'y', ' ', 'M', 'a', 'd', 'e', '\n', 'C', 'h', 'e', 'c', 'k', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 'T', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'i', 'n', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 's', 'p', 'a', 'c', 'e', ' ', '.', '.', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'R', 'H', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 't', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'f', 'o', 'r', ' ', 'r', 'u', 'n', '-', '7', ' ', 'w', 'a', 's', ' ', 'f', 'o', 'u', 'n', 'd', '.', ' ', 'I', 't', ' ', 'w', 'i', 'l', 'l', ' ', 'b', 'e', ' ', 'u', 's', 'e', 'd', ' ', 'f', 'o', 'r', ' ', 's', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'L', 'H', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 't', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'f', 'o', 'r', ' ', 'r', 'u', 'n', '-', '7', ' ', 'w', 'a', 's', ' ', 'f', 'o', 'u', 'n', 'd', '.', ' ', 'I', 't', ' ', 'w', 'i', 'l', 'l', ' ', 'b', 'e', ' ', 'u', 's', 'e', 'd', ' ', 'f', 'o', 'r', ' ', 's', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', '.', '\n', 'C', 'h', 'e', 'c', 'k', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 'T', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'i', 'n', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 's', 'p', 'a', 'c', 'e', ' ', '.', '.', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'R', 'H', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 't', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'f', 'o', 'r', ' ', 'r', 'u', 'n', '-', '5', ' ', 'w', 'a', 's', ' ', 'f', 'o', 'u', 'n', 'd', '.', ' ', 'I', 't', ' ', 'w', 'i', 'l', 'l', ' ', 'b', 'e', ' ', 'u', 's', 'e', 'd', ' ', 'f', 'o', 'r', ' ', 's', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'L']

I think I can troubleshoot this but here is the textfile I am trying to run through it. In case you would like to check out what is going on. Box

jinnyzor · December 5, 2022, 8:30pm

You need to wrap the textfile as a list:

with open(file, 'r') as f:
     textToBeHighlighted = [f.read()]

numam · December 5, 2022, 8:33pm

I came to tell you that doing that would make it work perfectly and you already knew!!! You are the best. Thank you so much!!

snehilvj · December 6, 2022, 3:50am

Here’s a simple way using dmc.Highlight if you only want to highlight words in one color.
docs: Dash Mantine Components - Highlight

import dash_mantine_components as dmc

text = """Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txtEExecuting commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt"""

words_to_highlight = ["snap", "disptmpl"]

app.layout = dmc.Highlight(text, highlight= words_to_highlight, highlightColor="green")

Output:

numam · December 7, 2022, 3:33am

Thank you for the reply @snehilvj ! I did not know this component existed

Topic		Replies	Views
Formatting and color the text Dash Python	7	17892	August 24, 2023
Highlight part of the cell content Dash Python	0	315	July 7, 2019
Is there a way to highlight a span of text? Dash Python	2	750	May 6, 2023
dcc.Markdown dangerously_allow_html inline styling Dash Python	1	1931	August 19, 2022
How to color only certain words in a text Div output Dash Python	1	1970	December 14, 2018

How to highlight (set different colors for) specific words in long text? Tags do not work

Related topics