There are lots of times when you want to get the inverse of an operation – like finding all of the rows not like some criteria. This is possible, but hard to search for, especially because this functionality is hiding behind the “tilde (~)” operator.
Here’s how it works:
import pandas
What if you have a column that is supposed to be a date-as-a-number, but your parse-this-column-as-a-date code keeps barfing? Now you can filter by a doesn’t-match-this-format criteria.
myDataFrame = pandas.DataFrame([("20010901"),("18670701"),("10660106"),("20010901"),("Thursday")], columns=["A"], index=[1,2,3,4,5])
Return a dataframe where this format doesn’t match
weirdos = myDataFrame[~(myDataFrame['A'].str.match('\d\d\d\d\d\d\d\d'))]
weirdos
is now a DataFrame which includes the rows where it doesn’t match the format, because of the tilde (~) beginning the definition.
I have also used this is get subsets of non-conforming DataFrames – I know what it is supposed to be like, but it is too hard to know all of the ways that your data may not be like that.
I found this when I was looking for the NOT equivalent of .isin
– which unfortunately doesn’t exist. That’s the problem with Huffman coding your operators into single characters – you can’t easily search for them if you don’t know what they’re called.