I'm reading an excel file into a dataframe where one of its columns has 1 to many values, delimited by a space. I need to search the column for a string of text, and if found return the complete value between its delimiters(not the entire cell value).
Input would look something like this –
import pandas as pd
df = pd.DataFrame({'command': ['abc123', 'abcdef', 'hold', 'why'],
'name': ['/l1/l2/good/a/b/c /l1/l2/bad/b /l1/l2/fred/x /la/lb/sure/blah',
'/l1/l2/fred/a/b/c /l1/l2/bad/b /l1/l2/fred/x',
'/l2/l3/betty/a/b/c /l1/l2/bad/b /l1/l2/fred/x /l1/l2/good/ha',
'/la/lb2/sure/a/b/c'],
'date': ['2020-05', '2020-05', '2020-05', '2020-06']})
With a print it returns -
command date name
0 abc123 2020-05 /l1/l2/good/a/b/c /l1/l2/bad/b /l1/l2/fred/x /la/lb/sure/blah
1 abcdef 2020-05 /l1/l2/fred/a/b/c /l1/l2/bad/b /l1/l2/fred/x
2 hold 2020-05 /l2/l3/betty/a/b/c /l1/l2/bad/b /l1/l2/fred/x /l1/l2/good/ha
3 why 2020-06 /la/lb2/sure/a/b/c
I'm using the following to parse for the strings I'm after –
terms = ['/l1/l2/good', '/la/lb/sure']
df = df[df['name'].str.contains('|'.join(terms))]
which returns –
command date name
0 abc123 2020-05 /l1/l2/good/a/b/c /l1/l2/bad/b /l1/l2/fred/x /la/lb/sure/blah
2 hold 2020-05 /l2/l3/betty/a/b/c /l1/l2/bad/b /l1/l2/fred/x /l1/l2/good/ha
3 why 2020-06 /la/lb/sure/a/b/c
What I would like returned is –
command date name
0 abc123 2020-05 /l1/l2/good/a/b/c /la/lb/sure/blah
2 hold 2020-05 /l1/l2/good/ha
3 why 2020-06 /la/lb/sure/a/b/c
I had tried performing a split on the space delimiter, but then I'm unable to loop through those values to parse them as needed.
Thanks.
question from:
https://stackoverflow.com/questions/65861404/python-parse-string-from-multi-valued-column 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…