clean() function would do mask job It include two parameters:
- text: you processed text
- replace_with: the replace type:include
identifier
andcompliance
by default, it would process card
,cnaddress
,cnid
,cnname
,credential
,email
,name
,phone
,url
mode, you can use add_detector() or remove_detector() to add or remove mode in it.
import desensitize
de=desensitize.Desensitize()
#de.remove_detector('email')
text=u"13725557496 contact Joe Duffy at [email protected] 370304197709200630 4401250222189922 王猛住在上海市陆家嘴汤臣一品"
de.clean(text)
#['', 'a16a00b1f7d5e4e1b20a5b7517e17463', ' contact ', 'Joe', ' ', 'Duffy', ' at ', '[email protected]', ' ', '370***********0630', ' ', '************9922', ' ', '王*', '住在', '上海市***', '汤臣一品']
import pandas as pd
df=pd.read_csv("test.csv")
df = df.applymap(str)
pd.save_csv("test1.csv")