How to highlight (set different colors for) specific words in long text? Tags do not work

Hello everyone!

I have been trying to highlight specific words in long text but I have not succeeded. Is there a method for doing this in DASH?
I thought that I could add tags around the words I want to highlight and then use css to create a new class to assign any chosen color. For instance, if I wanted to highlight the words inflated and surf_thickness using a red font, I would use tags in the following way:

Executing commands from fv_snaps_<tag>surf_thickness</tag>_disptmpl_<tag>inflated</tag>_thresh_1.0-3.5_lh.txt

However, this does not work. Any suggestions about how to achieve this?

Thank you for your help! :slight_smile:

Hello @numam,

In order to change colors in a string, you’ll need to use span to wrap the text you want to change.


edit: Rofl spam… gotta love autocorrect…

3 Likes

Thank you so much for your answer @jinnyzor!!! It did the trick. I owe you another one! You are the best! thank you so much!!! :slight_smile: :mechanical_arm:

I did it using a very small snipped of all the data I need to highlight which has several thousands of text (I am highlighting error logs of MRI data I am processing). When I try to apply the code to all the data, I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/cl/sd_8grvj2b1d1pk5cb96kdp00000gn/T/ipykernel_2625/1762203074.py in <module>
      4 app.layout = dbc.Container([
      5     html.Div([
----> 6        eval(allHighlightedTXT)
      7     ], style={"color":"white", 'font-size': '12pt'})
      8 ])

ValueError: source code string cannot contain null bytes

Do you know how I can fix this? When I check the text (printing it in Jupyter Notebook) Everything looks correct. I am thinking that the error is related to what dictates white spaces etc in the text and I cannot see those things using Jupyter. If you have or anyone has seen this before and know how to solve this, please let me know :slight_smile:

Here is the sample code for how I did it. in case it benefits others. Maybe there is a better way but this is how I applied your suggestion:

# NEGATIVE LOGS HIGHLIGHTING 
"""
First layer wraps keywords with '' and second layer wrapps keywords with the html.Span component
The Span components are categorized using className (positiveLogs and negativeLogs)
Use a CSS file in the assets folder for creating the classes positiveLogs and negativeLogs so you can chose the color and text
effects for each set of keywords. 
"""
textToBeHighlighted = "Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txtEExecuting commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt "

# NEGATIVE LOGS HIGHLIGHTING 
firstLayerKeywords = ["txt", "error", "failed", "cant't", "doesn't","could not open", "not reading","unable", "not found","mgz", "mgh"]
nfirstLayerPos = find_MultipleKeywordsInLog(firstLayerKeywords, textToBeHighlighted)
nfirstLayerHighlights= highlightKeywords(nfirstLayerPos, textToBeHighlighted, start_highlight="''", end_highlight="''")

secondLayerKeywords = ["'txt'", "'error'", "'failed'", "'cant't'", "'doesn't'","'could not open'", "'not reading'","'unable'", "'not found'","'mgz'", "'mgh'"]
nSecLayerPos = find_MultipleKeywordsInLog(secondLayerKeywords, nfirstLayerHighlights)
nSecLayerHighlights = highlightKeywords(nSecLayerPos, nfirstLayerHighlights, start_highlight=",html.Span(", end_highlight=",className= 'negativeLogs effect-shine'),")

# POSITIVE LOGS HIGHLIGHTING 
PositiveFirstLayerKeywords = ["snap", "disptmpl"]
pfirstLayerPos = find_MultipleKeywordsInLog(PositiveFirstLayerKeywords, nSecLayerHighlights)
pfirstLayerHighlights= highlightKeywords(pfirstLayerPos ,nSecLayerHighlights, start_highlight="''", end_highlight="''")

PositiveSecLayerKeywords = ["'snap'", "'disptmpl'"]
pSecLayerPos = find_MultipleKeywordsInLog(PositiveSecLayerKeywords, pfirstLayerHighlights)
pSecLayerHighlights = highlightKeywords(pSecLayerPos, pfirstLayerHighlights, start_highlight=",html.Span(", end_highlight=",className= 'positiveLogs effect-shine'),")

allHighlightedTXT = ''.join(("html.Div(['",pSecLayerHighlights,"'])"))

app = JupyterDash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.FONT_AWESOME], external_scripts=[
        {'src': 'assets/spanText.js'}])

app.layout = dbc.Container([
    html.Div([
       eval(allHighlightedTXT)
    ], style={"color":"white", 'font-size': '12pt'})
])

app.run_server(mode ="external", debug=True)

## FUNCTIONS USED ABOVE OFR FINDING AND HIGHLIGHTING THE KEYWORDS ( I COULD NOT CREATE ATTACHMENTS TO THIS POST FOR THEM BUT HERE THEY ARE)

import regex as re 

def find_MultipleKeywordsInLog(keywords, text):
    # GET ALL THE INDXS FOR ALL THE KEYWORDS AND PLACE THEM IN LIST AS PAIRS
    kwPositions = []
    for kw in keywords:
        matches = re.finditer(kw, text, flags=re.I)
        for match in matches:
            start, end = match.span()
            kwPositions += [[start, end]]
    # SORT THE LIST IN ASCENDING ORDER
    kwPositions.sort(key=lambda y: y[0])
    
    # PUT THE INDICES IN ONE LIST 
    keywordIndxs = []
    for i in range(0,len(kwPositions)):
        keywordIndxs += kwPositions[i][0], kwPositions[i][1]
    return keywordIndxs


def highlightKeywords(kwPositions, text, start_highlight="<mark>", end_highlight="</mark>"):
    entireTextWithhighlights = ""
    for i, (start, end) in enumerate(zip([None] + kwPositions, kwPositions + [None])):
        if i % 2:  # odd segments are highlighted
            entireTextWithhighlights += start_highlight + text[start:end] + end_highlight
        else:      # even segments are not
            entireTextWithhighlights += text[start:end]
    return entireTextWithhighlights

2 Likes

Typically, you’ll want to stay away from “eval” as it execute as python code…

Say I pass a string of text:

import shutil
shutil.rmtree()

This would delete your files.

You are better off building the function to take the string of text and replace your stings.

2 Likes

Thank you for letting me know :grimacing: It is my first time dealing with having constructed a string that I want to execute. I did not know that using eval() is not good practice. I will find a different solution.

1 Like

Give this a try and see if it works for you. :slight_smile:


# NEGATIVE LOGS HIGHLIGHTING
"""
First layer wraps keywords with '' and second layer wrapps keywords with the html.Span component
The Span components are categorized using className (positiveLogs and negativeLogs)
Use a CSS file in the assets folder for creating the classes positiveLogs and negativeLogs so you can chose the color and text
effects for each set of keywords. 
"""
textToBeHighlighted = ["Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txtEExecuting commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt "]

negativeWords = ["txt", "error", "failed", "cant't", "doesn't", "could not open", "not reading", "unable",
                      "not found", "mgz", "mgh"]
positiveWords = ["snap", "disptmpl"]


for i in negativeWords:
    newArray = []
    for x in textToBeHighlighted:
        if isinstance(x, str):
            for y in range(len(x.split(i))):
                newArray.append(x.split(i)[y])
                if y != len(x.split(i))-1:
                    newArray.append(html.Span(i, className='negativeLogs effect-shine'))
        else:
            newArray.append(x)

    textToBeHighlighted = newArray

for i in positiveWords:
    newArray = []
    for x in textToBeHighlighted:
        if isinstance(x, str):
            for y in range(len(x.split(i))):
                newArray.append(x.split(i)[y])
                if y != len(x.split(i))-1:
                    newArray.append(html.Span(i, className='positiveLogs effect-shine'))
        else:
            newArray.append(x)

    textToBeHighlighted = newArray

app = JupyterDash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.FONT_AWESOME], external_scripts=[
    {'src': 'assets/spanText.js'}])

app.layout = dbc.Container([
    html.Div(textToBeHighlighted, style={"color": "white", 'font-size': '12pt'})
])

app.run_server(mode="external", debug=True)
1 Like

@jinnyzor You just blew my mind away :exploding_head: Such a succinct and elegant solution. You are my hero. I did not have a clue that I could do that with split!!! i always used it to split strings based on symbols and not characters :exploding_head: It worked!
But when I ran the one of the log textfiles through the code, it broke down. I got an error: “Maximum call stack size exceeded” and the array does not look right. here is a snipped.

['2', '0', '2', '2', '-', '1', '2', '-', '0', '3', ' ', '0', '1', ':', '4', '3', ':', '0', '7', ':', ' ', 'E', 'x', 't', 'r', 'a', 'c', 't', 'i', 'o', 'n', ' ', 'o', 'f', ' ', 'S', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', ' ', 'S', 't', 'a', 'r', 't', 'e', 'd', ' ', '.', '.', '.', '\n', 'C', 'h', 'e', 'c', 'k', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 's', 'u', 'b', 'j', 'e', 'c', 't', ' ', 'l', 'i', 's', 't', '.', '.', '.', '\n', 's', 'u', 'b', '-', 'D', 'Z', '1', '7', 'B', 'H', '-', 's', 'e', 's', '-', '1', '_', 'r', 'u', 'n', 'l', 'i', 's', 't', '4', 'Q', 'C', '.', 't', 'x', 't', ' ', 'S', 'u', 'b', 'j', 'e', 'c', 't', ' ', 'L', 'i', 's', 't', ' ', 's', 'u', 'c', 'c', 'e', 's', 'f', 'u', 'l', 'l', 'y', ' ', 'M', 'a', 'd', 'e', '\n', 'C', 'h', 'e', 'c', 'k', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 'T', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'i', 'n', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 's', 'p', 'a', 'c', 'e', ' ', '.', '.', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'R', 'H', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 't', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'f', 'o', 'r', ' ', 'r', 'u', 'n', '-', '7', ' ', 'w', 'a', 's', ' ', 'f', 'o', 'u', 'n', 'd', '.', ' ', 'I', 't', ' ', 'w', 'i', 'l', 'l', ' ', 'b', 'e', ' ', 'u', 's', 'e', 'd', ' ', 'f', 'o', 'r', ' ', 's', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'L', 'H', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 't', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'f', 'o', 'r', ' ', 'r', 'u', 'n', '-', '7', ' ', 'w', 'a', 's', ' ', 'f', 'o', 'u', 'n', 'd', '.', ' ', 'I', 't', ' ', 'w', 'i', 'l', 'l', ' ', 'b', 'e', ' ', 'u', 's', 'e', 'd', ' ', 'f', 'o', 'r', ' ', 's', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', '.', '\n', 'C', 'h', 'e', 'c', 'k', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 'T', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'i', 'n', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 's', 'p', 'a', 'c', 'e', ' ', '.', '.', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'R', 'H', ' ', 'f', 's', 'a', 'v', 'e', 'r', 'a', 'g', 'e', ' ', 't', 'h', 'i', 'c', 'k', 'n', 'e', 's', 's', ' ', 'f', 'i', 'l', 'e', ' ', 'f', 'o', 'r', ' ', 'r', 'u', 'n', '-', '5', ' ', 'w', 'a', 's', ' ', 'f', 'o', 'u', 'n', 'd', '.', ' ', 'I', 't', ' ', 'w', 'i', 'l', 'l', ' ', 'b', 'e', ' ', 'u', 's', 'e', 'd', ' ', 'f', 'o', 'r', ' ', 's', 'n', 'a', 'p', 's', 'h', 'o', 't', 's', '.', '\n', 'E', 'x', 'i', 's', 't', 'i', 'n', 'g', ' ', 'L']

I think I can troubleshoot this but here is the textfile I am trying to run through it. In case you would like to check out what is going on. Box

2 Likes

You need to wrap the textfile as a list:

with open(file, 'r') as f:
     textToBeHighlighted = [f.read()]
1 Like

I came to tell you that doing that would make it work perfectly and you already knew!!! You are the best. Thank you so much!!

2 Likes

Here’s a simple way using dmc.Highlight if you only want to highlight words in one color.
docs: Dash Mantine Components - Highlight

import dash_mantine_components as dmc

text = """Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt Executing commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_rh.txtEExecuting commands from fv_snaps_surf_thickness_disptmpl_inflated_thresh_1.0-3.5_lh.txt"""

words_to_highlight = ["snap", "disptmpl"]

app.layout = dmc.Highlight(text, highlight= words_to_highlight, highlightColor="green")

Output:

2 Likes

Thank you for the reply @snehilvj ! I did not know this component existed :slight_smile:

1 Like