Using the Tilde as a Boolean NOT in Pandas

There are lots of times when you want to get the inverse of an operation – like finding all of the rows not like some criteria. This is possible, but hard to search for, especially because this functionality is hiding behind the “tilde (~)” operator.

Here’s how it works:

import pandas

What if you have a column that is supposed to be a date-as-a-number, but your parse-this-column-as-a-date code keeps barfing? Now you can filter by a doesn’t-match-this-format criteria.

myDataFrame = pandas.DataFrame([("20010901"),("18670701"),("10660106"),("20010901"),("Thursday")], columns=["A"], index=[1,2,3,4,5])

Return a dataframe where this format doesn’t match

weirdos = myDataFrame[~(myDataFrame['A'].str.match('\d\d\d\d\d\d\d\d'))]

weirdos is now a DataFrame which includes the rows where it doesn’t match the format, because of the tilde (~) beginning the definition.

I have also used this is get subsets of non-conforming DataFrames – I know what it is supposed to be like, but it is too hard to know all of the ways that your data may not be like that.

I found this when I was looking for the NOT equivalent of .isin – which unfortunately doesn’t exist. That’s the problem with Huffman coding your operators into single characters – you can’t easily search for them if you don’t know what they’re called.

f-Strings

One of the delights of coming back to Python in an intensive way after many years is some of the new ways to do the usual things that I hadn’t learned before.

In this case, I am finding the use of “f-strings” (introduced in Python 3.6, described in PEP 498) to be quite delightful.

print(f"Warning: {sys.argv[2]} exists!")

The previous syntax for format strings always slipped away from my memory, but these seem a lot stickier, and they give me a lot of freedom to just keep writing.

Python is not the greatest language in the universe, but it can allow a lot of fluency that I never experienced in other languages.

Keeping Secrets in Code

The problem of keeping secrets – usernames, passwords, API keys, etc, in code that you write is a pretty old problem. I haven’t had a solution that I liked – especially when I am putting code up on github, for a long time.

Until now. I am putting things like that in a “secrets” file, or in environment variables, which are easy to access from your code, but don’t show up in your code repository. Here’s an example in Python of keeping a “secrets” file that the script can access, and then yoinking its contents into a dictionary for easy reference:

def getSecrets():

def getSecrets():
  SECRETFILE = os.environ["SECRETFILE"]
  with open(SECRETFILE , "r") as scrts:
    return dict(line.strip().split("=====") for line in scrts)

secrets = getSecrets()

api_url = secrets["Test API URL"]
api_key = secrets["Test API Key"]

""" The file itself would look like this:

API Key=====zEV}pF_vn4g35Ye:
API URL=====https://example.com
..."""