Using the Tilde as a Boolean NOT in Pandas

There are lots of times when you want to get the inverse of an operation – like finding all of the rows not like some criteria. This is possible, but hard to search for, especially because this functionality is hiding behind the “tilde (~)” operator.

Here’s how it works:

import pandas

What if you have a column that is supposed to be a date-as-a-number, but your parse-this-column-as-a-date code keeps barfing? Now you can filter by a doesn’t-match-this-format criteria.

myDataFrame = pandas.DataFrame([("20010901"),("18670701"),("10660106"),("20010901"),("Thursday")], columns=["A"], index=[1,2,3,4,5])

Return a dataframe where this format doesn’t match

weirdos = myDataFrame[~(myDataFrame['A'].str.match('\d\d\d\d\d\d\d\d'))]

weirdos is now a DataFrame which includes the rows where it doesn’t match the format, because of the tilde (~) beginning the definition.

I have also used this is get subsets of non-conforming DataFrames – I know what it is supposed to be like, but it is too hard to know all of the ways that your data may not be like that.

I found this when I was looking for the NOT equivalent of .isin – which unfortunately doesn’t exist. That’s the problem with Huffman coding your operators into single characters – you can’t easily search for them if you don’t know what they’re called.