Use Series.str.contains
with values and ^
for start and $
for end of string:
file = [['id', 'genome'],
['0', 'ATGTTTGTTTTT'],
['1', 'ATGTTTGTXXXX'],
['2', 'ATGDD2GTTTTT']
]
df = pd.DataFrame(file[1:], columns=file[0])
print (df)
df = df[df['genome'].str.contains('^[ACTGN]+$')]
print (df)
id genome
0 0 ATGTTTGTTTTT
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…