iLoveTux

A Blog about Python, Linux and everything else that makes life fun.

How I built an Enhanced Single-File Python Interpreter in Under 100 Lines of Code

ilovetux | 25 October, 2015 13:32

Introduction

Well, I'll start off by saying that the title of this blog post is a bit misleading, I will not actually be writing a Python interpreter, compiler, lexer or parser. Instead I will share a method which has allowed me to perform minor miracles in certain (very restricted) IT environments.


We will write a simple Python script which will act like a Python Interpreter in as much as if you call it and pass in a Python source file it will execute it. This will either sound too good to be True or it will sound like a childish solution. Either way, Let's go!

If you want to check this project out, just head over to the github repo. Also, the source (as it stands now 10-16-2015) is listed at the end.

Base Functionality

Let's get a skeleton going for our project. I start a lot of projects out like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import os
import sys
import code
import runpy
import atexit
import argparse
 
__version__ = "moonpy 0.6.0"
 
def main(argv=None):
    pass
 
def parse_args(argv=None):
    argv = sys.argv[1:] if argv is None else argv
    parser = argparse.ArgumentParser()
    return parser.parse_args(argv)
 
if __name__ == "__main__":
    main()



I've included the imports, so if you are familiar with these libraries it should seem pretty obvious how I'm going to go about making an interpreter, if you are not, then no worries just read along as you're about to see some very cool libraries at work.

I've also highlighted a line above where I adjust argv to be sys.argv[1:] if no argv was passed in. This is important and took me a long time to figure out. Aparantly, if you do not pass anything to argparse.ArgumentParser.parse_args then it's smart enough to strip the first argument (the scripts name), but if you explicitly pass in a list of arguments, it does not. For testing purposes, I wanted to pass in lists instead of modifying sys.argv everytime, so this is one way to support that.

Now, I am going off of specs laid out in this document, so if you want to follow along you should open that page in a new tab. Now at first, we will not be supporting all arguments, but the most commonly used ones.

First up is:

When called with standard input connected to a tty device, it prompts for commands and executes them until an EOF (an end-of-file character, you can produce that with Ctrl-D on UNIX or Ctrl-Z, Enter on Windows) is read.

you can read that as if there are no arguments, start a REPL (Read, Eval, Print Loop)

Adding a REPL

Well, I've actually coded a few REPLs in my day, but there is no need to do this if you just want an interactive Python interpreter as the code module (which is in the standard library) provides an interactive console. Since there are no other options to support yet, the logic might look bare, but don't worry it'll flesh out very soon.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import os
import sys
import code
import runpy
import atexit
import argparse
 
__version__ = "moonpy 0.6.0"
 
def interact(console=None):
    """Launch an enhanced, interactive Python interpreter"""
    sys.argv = [""]
    if not console:
        console = code.InteractiveConsole(locals={"exit": sys.exit})
    console.interact("MoonPy - m-o-o-n that spells Python")
 
def main(argv=None):
    interact()
 
def parse_args(argv=None):
    argv = sys.argv[1:] if argv is None else argv
    parser = argparse.ArgumentParser()
    args = parser.parse_args(argv)
    return args
 
if __name__ == "__main__":
    main()



If you look at the parse_args() function, you will see that I added a bit, but that's just to get it to run, however we will be going over that, but first let's look at the highlighted lines.

The interact() function will accept a console argument, but if none is provided it will create one. Because Python presents

1
sys.argv

as a single item list [''] when called in this form (without arguments), we go ahead and take care of that in the first line of the function. Next, we create a new code.InteractiveConsole() if none was provided. Finally, we tell the console to interact() with the user presenting a custom banner. That's all we have to do to add an interactive mode to our program the code module will handle all of the hard parts.

In the main() function, as we don't yet have support for any other modes of execution is very simple, it just invokes the interact() function. This will soon be filled up with logic to handle our various features.

Next up on our list is:

When called with a file name argument or with a file as standard input, it reads and executes a script from that file.

Let's get started on that!

Adding file support

So we need to allow a file to be specified, and that file needs to default to sys.stdin and it should be stdin if a - is supplied as the argument. This is pretty simple to implement. While we are at it we will also add support for -i which should drop you into an interactive interpreter after executing file. Take a look below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def run_path(args):
    """Run a path (script, zip file, or dir) just like python <script>"""
    sys.argv.pop(0)
    _locals = runpy.run_path(args.file, run_name="__main__")
    if args.i:
        interact(code.InteractiveConsole(locals=_locals))
 
def main(argv=None):
    if args.file is sys.stdin:
        if args.file.isatty():
            interact()
        else:
            pass
    else:
        run_path(args)
 
def parse_args(argv=None):
    argv = sys.argv[1:] if argv is None else argv
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", action="store_true")
    parser.add_argument("file", type=str, default="-", nargs="?")
    parser.add_argument("args", nargs=argparse.REMAINDER)
    args = parser.parse_args(argv)
    args.file = sys.stdin if args.file == "-" else args.file
    return args



So, let's first look at the changes to the parse_args() function. We added three arguments -i, file and args. These are all for supporting the script functionality. The -i will drop us into an interactive interpreter after executing the file, args collects the reminder of the arguments which is important because we don't want our program to try and interpret them. Finally, the file argument collects the file to execute. Since we don't want to actually open the file ourselves, we accept strs and default to -, which we change to sys.stdin before returning the parsed arguments.

In the main() function, we now have two big if statements. Our control flow goes like this:

1. if file is stdin there are two possibilities
    1. user wants an interactive interpreter
    2. user piped in code like cat example.py | python moon.py
2. if file is not sys.stdin, then it is a source file to be executed

The first possibility of file being sys.stdin we already handled, this is when the user wants an interactive interpreter. The second (the user piped in source we will handle in the next section).

The big change to this is when the user wants to execute a script (or module) if this is the case then we call run_path() which we defined above.

The run_path() function does three things:

1. Remove our script's name from sys.argv so the script being executed thinks it was the first argument
2. Runs the script, while also capturing the state of locals()
3. If -i was provided, drop the user into an interactive interpreter with the same environment as the script had

The big feature here is provided courtesy of the runpy.run_path function, which adds a ton of functionality to us for free, namely:

1. will run a Python source file (eg. sample.py)
2. will run a module in a directory (eg. src/module/) (as long as a __main__.py is present)
3. Run a module contained in a zip file (if run with Python 2.6+), This is an incredibly useful feature which I just learned about while working on this project I can't wait to use it in one of my future projects.
4. It returns the value of locals() so we can initialize the interactive interpreter with them

Next, let's handle the case when a user pipes in source code!

Supporting piped (stdin) source

Check out the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def run_stdin(stdin):
    sys.argv[0] = ""; sys.path.insert(0, "")
    _locals = {"exit": sys.exit, "__file__": stdin.name}
    interpreter = code.InteractiveInterpreter(locals=_locals)
    [interpreter.runsource(line) for line in stdin]
    sys.exit(0)
 
def main(argv=None):
    """Business logic. respond to argv after parsing"""
    args = parse_args(argv)
    if args.file is sys.stdin:
        if args.file.isatty():
            interact()
        else:
            run_stdin(args.file)
    else:
        run_path(args)



First we add the run_stdin() function which will handle stdin for us then we add the call to run_stdin() to our main function.

The run_stdin() function performs the following steps:

1. we change sys.argv[0] to an empty string and insert an empty string as the first entry to sys.path
2. setup some locals for the interpreter.
    1. we need exit() to be defined
    2. we need __file__ to point to the stdin.name
3. We set up an interpreter with _locals
4. We run through the lines of stdin and have the interpreter execute them. (I have an sub-optimal way of executing these within a list comprehension (sub-optimal because it generates a list which we immediately discard), but we have to make some sacrifices to maintain the 100 lines of code requirement)
5. We exit

Next, let's add support for -m!

Adding support for -m

 So, while Python (and moonpy) supports executing a module be passing the directory name in as file, there is another use case which we need to support and that is the -m. The -m has very similar functionality except that instead of having the directory name, it operates on the module name (and searches sys.path for it).

Check out the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def run_m(args):
    """Run a module, just like python -m <module_name>"""
    sys.argv = sys.argv[sys.argv.index("-m"):]
    sys.argv.remove(args.m)
    if not len(sys.argv): sys.argv.append("")
    runpy.run_module(args.m, run_name="__main__", alter_sys=True)
    sys.exit(0)
 
def main(argv=None):
    """Business logic. respond to argv after parsing"""
    args = parse_args(argv)
    if args.m:
        run_m(args)
    if args.file is sys.stdin:
        if args.file.isatty():
            interact()
        else:
            run_stdin(args.file)
    else:
        run_path(args)
 
def parse_args(argv=None):
    argv = sys.argv[1:] if argv is None else argv
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", action="store_true")
    parser.add_argument("-m", type=str, nargs="?")
    parser.add_argument("file", type=str, default="-", nargs="?")
    parser.add_argument("args", nargs=argparse.REMAINDER)
    args = parser.parse_args(argv)
    args.file = sys.stdin if args.file == "-" else args.file
    return args



First, we define a function called run_m which will handle the -m for us. It performs the following actions:

  1. Adjusts sys.argv and sys.path (some of this functionality is handled by runpy.run_module)
    1. sys.argv is adjusted by removing all arguments up to -m and the actual value passed in to -m is removed
    2. sys.path is prepended with the current working directory
  2. The module is run
  3. We exit


In the main() function we call run_m() if args.m is present. The function calls sys.exit(0) so execution will stop at that point.

Now we need to add support for -c!

Adding support for -c

One of the most useful options Python provides is the ability to specify a Python command using the -c option. This allows us to do things like this (this is a trivial example) python -c "import sys; print sys.path".

We can add this functionality with the following code changes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def run_c(args):
    """Run a command, just like python -c <command>"""
    sys.argv = sys.argv[sys.argv.index("-c"):]; sys.argv.remove(args.c)
    sys.path.insert(0, os.path.abspath("."))
    interpreter = code.InteractiveConsole()
    interpreter.runsource(args.c)
    if args.i:
        interpreter.locals["exit"] = sys.exit; interact(interpreter)
    sys.exit(0)
 
def main(argv=None):
    """Business logic. respond to argv after parsing"""
    args = parse_args(argv)
    if "-m" in sys.argv and "-c" in sys.argv:
        func = run_m if sys.argv.index("-m") < sys.argv.index("-c") else run_c
        func(args)
    elif args.c: run_c(args)
    elif args.m: run_m(args)
    if args.file is sys.stdin:
        if args.file.isatty():
            interact()
        else:
            run_stdin(args.file)
    else:
        run_path(args)
 
def parse_args(argv=None):
    argv = sys.argv[1:] if argv is None else argv
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", action="store_true")
    parser.add_argument("-c", type=str, nargs="?")
    parser.add_argument("-m", type=str, nargs="?")
    parser.add_argument("file", type=str, default="-", nargs="?")
    parser.add_argument("args", nargs=argparse.REMAINDER)
    args = parser.parse_args(argv)
    args.file = sys.stdin if args.file == "-" else args.file
    return args



We added some more logic to main here, mainly because Python will honor whichever comes first (out of -m or -c) and argparse does not act this way so we have to workaround it. Then you need to read the elif statements carefully because they were collapsed to one line (I love the 100 lines thing).

The run_c() function does several things, first, sets sys.argv to ["-c",...] where ... is everything except the actual command to be executed. This is intended to match the behavior of Python. Next, the current directory is added to sys.path. Next it sets up an interpreter and runs the command which was passed in. Finally, dropping us into an interpreter if -i was provided.

Next up is adding support for -V/--version!

Adding -V/--version support.

This feature is very easy to implement. Check out the following code changes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
__version__ = "moonpy 0.6.0"
 
def main(argv=None):
    """Business logic. respond to argv after parsing"""
    args = parse_args(argv)
    if args.version: print __version__; sys.exit(0)
    if "-m" in sys.argv and "-c" in sys.argv:
        func = run_m if sys.argv.index("-m") < sys.argv.index("-c") else run_c
        func(args)
    elif args.c: run_c(args)
    elif args.m: run_m(args)
    if args.file is sys.stdin:
        if args.file.isatty():
            interact()
        else:
            run_stdin(args.file)
    else:
        run_path(args)
 
def parse_args(argv=None):
    argv = sys.argv[1:] if argv is None else argv
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", action="store_true")
    parser.add_argument("-V", "--version", action="store_true")
    parser.add_argument("-c", type=str, nargs="?")
    parser.add_argument("-m", type=str, nargs="?")
    parser.add_argument("file", type=str, default="-", nargs="?")
    parser.add_argument("args", nargs=argparse.REMAINDER)
    args = parser.parse_args(argv)
    args.file = sys.stdin if args.file == "-" else args.file
    return args



First, At the top of the file we set up a __version__ variable, this is easy to do, and many other Python tools will respect it. Perhaps in the future I will add a more dynamic way to determine the version of moonpy, but for now, this is only one line to change for each release which isn't too bad.

Next in the main() function, we add a line which, if args.version is True, prints __version__ and exits.

Finally we add the option to our command line parser.

Next up we are going to enhance the Python interpreter with cross-session history support and tab-completion. This is mainly so I can say "an enhanced Python interpreter in 100 lines of code"!

Adding history support and tab-completion

So it appears that Python provides support for history, but only within the current session (unless you provide a startup script like this stackoverflow answer suggests), but it does not provide tab-completion (The answer linked does provide the tab completion). In order to claim that my project has an enhanced Python interpreter, I am providing this functionality by default.

To see how this is done, check out the code changes below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def interact(console=None):
    """Launch an enhanced, interactive Python interpreter"""
    set_up_history()
    sys.argv = [""]
    if not console:
        console = code.InteractiveConsole(locals={"exit": sys.exit})
    console.interact("MoonPy - m-o-o-n that spells Python")
 
def set_up_history():
    """Taken from https://docs.python.org/2/library/readline.html#example"""
    try: import readline
    except ImportError: import pyreadline as readline
    else: import rlcompleter; readline.parse_and_bind("tab: complete")
    histfile = os.path.join(os.path.expanduser("~"), ".pyhist")
    try: readline.read_history_file(histfile)
    except IOError: pass
    atexit.register(readline.write_history_file, histfile)



This is a little difficult to read (100 lines of code, remember), but it essentially boils down to this:

  1. when interact() is called it now calls set_up_history()
  2. set_up_history() tries to import some things which will enable history
  3. Then set_up_history() calls a couple of functions (readline.parse_and_bind() and read_history_file()) which actually do the work of setting up tab-completion and cross-session history


Now, let's add support for a site-packages directory!

Adding support for site-packages

So, this feature we are departing from the way Python handles things a little bit. If you don't know the site-packages directory is where you're site-specific or third-party modules are installed. The location of this directory is different depending on your platform, but with moonpy, we will require that this directory be in the same directory as the moonpy executable itself. This is meant to simplify things a bit.

In order to facilitate this feature, we need to make the following changes to our main function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def main(argv=None):
    """Business logic. respond to argv after parsing"""
    here = os.path.dirname(os.path.abspath(__file__))
    if "site-packages" in os.listdir(here):
        sys.path.insert(0, os.path.abspath(os.path.join(here, "site-packages")))
    args = parse_args(argv)
    if args.version: print __version__; sys.exit(0)
    if "-m" in sys.argv and "-c" in sys.argv:
        func = run_m if sys.argv.index("-m") < sys.argv.index("-c") else run_c
        func(args)
    elif args.c: run_c(args)
    elif args.m: run_m(args)
    if args.file is sys.stdin:
        if args.file.isatty():
            interact()
        else:
            run_stdin(args.file)
    else:
        run_path(args)



That's it! Basically if the directory exists, it is prepended to the path.

Well, That's it! Everything we set out to do is done.

Conclusion

OK, so this project isn't done by any means. I will try to write a blog post for each significant change, but I cannot promise anything. This (and all of my other blog posts) are part of a github project If you notice any mistakes, omissions or typos please open an issue in the issue tracker.

For your convenience, the code is presented in it's entirety is listed below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#!/usr/bin/env python
import os
import sys
import code
import runpy
import atexit
import argparse
 
__version__ = "moonpy 0.6.0"
 
def set_up_history():
    """Taken from https://docs.python.org/2/library/readline.html#example"""
    try: import readline
    except ImportError: import pyreadline as readline
    else: import rlcompleter; readline.parse_and_bind("tab: complete")
    histfile = os.path.join(os.path.expanduser("~"), ".pyhist")
    try: readline.read_history_file(histfile)
    except IOError: pass
    atexit.register(readline.write_history_file, histfile)
 
def run_path(args):
    """Run a path (script, zip file, or dir) just like python <script>"""
    sys.argv.pop(0)
    _locals = runpy.run_path(args.file, run_name="__main__")
    if args.i:
        interact(code.InteractiveConsole(locals=_locals))
 
def run_m(args):
    """Run a module, just like python -m <module_name>"""
    sys.argv = sys.argv[sys.argv.index("-m"):]
    sys.argv.remove(args.m)
    if not len(sys.argv): sys.argv.append("")
    runpy.run_module(args.m, run_name="__main__", alter_sys=True)
    sys.exit(0)
 
def run_c(args):
    """Run a command, just like python -c <command>"""
    sys.argv = sys.argv[sys.argv.index("-c"):]; sys.argv.remove(args.c)
    sys.path.insert(0, os.path.abspath("."))
    interpreter = code.InteractiveConsole()
    interpreter.runsource(args.c)
    if args.i:
        interpreter.locals["exit"] = sys.exit; interact(interpreter)
    sys.exit(0)
 
def run_stdin(stdin):
    sys.argv[0] = ""; sys.path.insert(0, "")
    _locals = {"exit": sys.exit, "__file__": stdin.name}
    interpreter = code.InteractiveInterpreter(locals=_locals)
    [interpreter.runsource(line) for line in stdin]
    sys.exit(0)
 
def interact(console=None):
    """Launch an enhanced, interactive Python interpreter"""
    set_up_history()
    sys.argv = [""]
    if not console:
        console = code.InteractiveConsole(locals={"exit": sys.exit})
    console.interact("MoonPy - m-o-o-n that spells Python")
 
def main(argv=None):
    """Business logic. respond to argv after parsing"""
    here = os.path.dirname(os.path.abspath(__file__))
    if "site-packages" in os.listdir(here):
        sys.path.insert(0, os.path.abspath(os.path.join(here, "site-packages")))
    args = parse_args(argv)
    if args.version: print __version__; sys.exit(0)
    if "-m" in sys.argv and "-c" in sys.argv:
        func = run_m if sys.argv.index("-m") < sys.argv.index("-c") else run_c
        func(args)
    elif args.c: run_c(args)
    elif args.m: run_m(args)
    if args.file is sys.stdin:
        if args.file.isatty():
            interact()
        else:
            run_stdin(args.file)
    else:
        run_path(args)
 
def parse_args(argv=None):
    """parse argv"""
    argv = sys.argv[1:] if argv is None else argv
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", action="store_true")
    parser.add_argument("-V", "--version", action="store_true")
    parser.add_argument("-c", type=str, nargs="?")
    parser.add_argument("-m", type=str, nargs="?")
    parser.add_argument("file", type=str, default="-", nargs="?")
    parser.add_argument("args", nargs=argparse.REMAINDER)
    args = parser.parse_args(argv)
    args.file = sys.stdin if args.file == "-" else args.file
    return args
 
if __name__ == "__main__":
    main()



There you go, a Python interpreter in 96 lines of code!

See you next time, Happy Coding!

Implementing grep in Python

ilovetux | 24 October, 2015 04:19

Hello, and welcome to the first installment of my blog. Today we will start building a clone of grep in pure-python with no third-party extentions. We will limit the number of features that we will implement today in the name of space and time. Specifically, we will implement the following features:

  1. pass arbitrary search pattern as first positional argument
  2. make pattern act as a regular expression (this will differ slightly from grep's regular expressions because we will use Python's regex engine)
  3. pass arbitrary number of files to search (if your shell supports glob patterns like bash this will mean that this script will as well)
  4. Specify -i for case-insensitive search
  5. Specify -n to print line numbers for the matches
  6. Specify -H to print filename for the matches


Note: For numbers 4 and 5 we will not be implementing the color output ust yet, stay tuned if you are interested in that.

Note: If you want to see a semi-production ready version of this script, checkout [the project on github](https://github.com/ilovetux/greppy). Also, this is available to be installed from [pypi](https://pypi.python.org/pypi/greppy) with a simple `pip install greppy`.

Basic Functionality

Let's define the basic structure of our program and include the imports
we are going to need.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import re
import sys
import atexit
import argparse
 
 
def main(args):
    pass
 
 
def parse_args():
    pass
 
 
if __name__ == "__main__":
    main(parse_args())

That's basically it. We will put the main (business) logic in our function main and we will put the command line parsing logic in our parse_args function. Now for our basic functionality.

We will allow pattern to be specified positionally, meaning that the first non-optional argument passed to our script will be the pattern for which to search. We can do that by changing our parse_args function like so:


1
2
3
4
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("pattern", type=str)
    return parser.parse_args()


Now when we invoke our script, the first string passed in will be the pattern to search for. Next, we need to accept files. This one is a little tricky, because all of the following conditions have to be met:

  1. If no files are passed, we need to default to sys.stdin
  2. If more than one file is passed, we need to search all of them
  3. If any files are provided, we need to make sure that they are closed after execution


We can accomplish all of these with the following changes to our script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def main(args):
 
    atexit.register(lambda files: [f.close() for f in files], args.files)
 
    for file_in in args.files:
        pass
 
 
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("pattern", type=str)
    parser.add_argument(
        "files", type=argparse.FileType("r"), default=[sys.stdin], nargs="*")
    return parser.parse_args()


So, now we are really cooking, most of our desired functionality is implemented (at least the tricky parts). Let's explain what's going on here. In parse_args, We made the following changes:

  • We added a two-line statement which adds an argument called files.
    • The parameter type=argparse.FileType("r") says that anything passed in should be a valid filename, that the file should exist (ensured by opening in read "r" mode) and that the file should be open and ready for us to read.
    • The parameter nargs="*" says that we should accept 0+ arguments (meaning any number even zero)
      • These arguments will be passed to us as a list because of specifying nargs="*".
    • The parameter default=[sys.stdin] says that if no arguments are passed we are to default to a list of one item (stdin).


Now, because argparse will not close our files for us, we had to add the line atexit.register(lambda files: [f.close() for f in files], files) which tells Python to loop through the files and close them when the script exits. We also added a skeleton of a loop in our main function. This will simply iterate through the files passed in (or sys.stdin), this is where our main logic will go.

So, we have enough right now to implement our basic search functionality, so let's do that. We can have a decently working grep clone with the following
changes:

1
2
3
4
5
6
7
def main(args):
    atexit.register(lambda files: [f.close() for f in files], args.files)
 
    for file_in in args.files:
        for line in file_in:
            if args.pattern in line:
                print line.strip()


We can now accept a pattern and a list of files (or pipe input from another command), and our script will scan each line of input and print the line if pattern is found in the line. Awesome! This is actually enough functionality to cover most of grep's use-cases, but there are a few options which are common-enough edge cases that they will be sorely missed if they are not included in our grep clone. Let's get started on those.

Adding case-insensitive search

It is very common to need to search for a string regardless of case, so let's get started on that.

This is a very simple change to implement. First let's change our parse_args function to accept a -i option:

1
2
3
4
5
6
7
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--ignore-case", action="store_true")
    parser.add_argument("pattern", type=str)
    parser.add_argument(
        "files", type=argparse.FileType("r"), default=[sys.stdin], nargs="*")
    return parser.parse_args()


Now when we look at args.ignore_case it will be either True or False (read boolean) as to whether to ignore the case when matching. Now that our users can specify this option let's implement it in our main function:

1
2
3
4
5
6
7
8
9
def main(args):
    atexit.register(lambda files: [f.close() for f in files], args.files)
    args.pattern = args.pattern.lower() if args.ignore_case else args.pattern
 
    for file_in in args.files:
        for line in file_in:
            line = line.lower() if args.ignore_case else line
            if args.pattern in line:
                print line.strip()


So, we added two lines to our function to offer a -i option. The first line we added changes args.pattern to all lower case if `args.ignore_case` is True. The second line does the same for every line of text we are matching against. By ensuring that both pattern and our line is all lower-case we can be sure that a match will occur regardless of case.

Let's move onto adding regular expression support.

Adding RegEx support

Changing our script to treat pattern as a regular expression is fairly straightforward. We simply need to change our main function to the following:

1
2
3
4
5
6
7
8
def main(args):
    atexit.register(lambda files: [f.close() for f in files], args.files)
 
    for file_in in args.files:
        for line in file_in:
            flags = re.IGNORECASE if args.ignore_case else 0
            if re.search(args.pattern, line, flags):
                print line.strip()


Notice that I was able to replace the two lines dealing with case sensitivity with just one line, because we are now using Python's regular expression engine instead of simple string matching. We also had to replace our if statement to use the re.search method.

This still acts as it did before, if you provide a sub-string it will still match because sub-strings are valid regular expressions, but now we will have to escape any special characters if we want them to match literally.

Adding line numbers

A great addition to all of our current functionality is the ability to print the line number of the file along with the line. This is really helpful for large files, where we want to know where certain string(s) are located without visually inspecting the file. First we need to add an option to our ArgumentParser to allow our users to specify that they want to print the line numbers. We can do that by changing our parse_args function to the following:

1
2
3
4
5
6
7
8
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--ignore-case", action="store_true")
    parser.add_argument("-n", "--line-number", action="store_true")
    parser.add_argument("pattern", type=str)
    parser.add_argument(
        "files", type=argparse.FileType("r"), default=[sys.stdin], nargs="*")
    return parser.parse_args()


Now that we can tell when the user wants the line numbers displayed, we can go ahead and add the functionality. We can do this with one additional line (and a simple change to an existing line). We need to change our main function to the following:

1
2
3
4
5
6
7
8
9
def main(args):
    atexit.register(lambda files: [f.close() for f in files], args.files)
 
    for file_in in args.files:
        for index, line in enumerate(file_in):
            flags = re.IGNORECASE if args.ignore_case else 0
            if re.search(args.pattern, line, flags):
                line = "{}:{}".format(index, line.strip()) if args.line_number else line
                print line.strip()


That's it, now line numbers will be printed for each line which matches.

We have one last change for this tutorial, and that is the ability to add the filename to each line of output. Let's get to it.

Adding filenames

So when users are scanning multiple files, it can be really helpful to add the filename to each line of output. This is another very easy change. First, let's add the argument to our ArgumentParser:

1
2
3
4
5
6
7
8
9
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--ignore-case", action="store_true")
    parser.add_argument("-n", "--line-number", action="store_true")
    parser.add_argument("-H", "--with-filename", action="store_true")
    parser.add_argument("pattern", type=str)
    parser.add_argument(
        "files", type=argparse.FileType("r"), default=[sys.stdin], nargs="*")
    return parser.parse_args()


Now, we can simply add one more line to add the filename to our output:

1
2
3
4
5
6
7
8
9
10
def main(args):
    atexit.register(lambda files: [f.close() for f in files], args.files)
 
    for file_in in args.files:
        for index, line in enumerate(file_in):
            flags = re.IGNORECASE if args.ignore_case else 0
            if re.search(args.pattern, line, flags):
                line = "{}:{}".format(index, line) if args.line_number else line
                line = "{}:{}".format(file_in.name, line) if args.with_filename else line
                print line.strip()


That's it, all of our functionality for this tutorial is complete. For your convenience, the complete code listing is below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import re
import sys
import atexit
import argparse
 
 
def main(args):
    atexit.register(lambda files: [f.close() for f in files], args.files)
 
    for file_in in args.files:
        for index, line in enumerate(file_in):
            flags = re.IGNORECASE if args.ignore_case else 0
            if re.search(args.pattern, line, flags):
                line = "{}:{}".format(index, line.strip()) if args.line_number else line
                line = "{}:{}".format(file_in.name, line.strip()) if args.with_filename else line
                print line.strip()
 
 
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--ignore-case", action="store_true")
    parser.add_argument("-n", "--line-number", action="store_true")
    parser.add_argument("-H", "--with-filename", action="store_true")
    parser.add_argument("pattern", type=str)
    parser.add_argument(
        "files", type=argparse.FileType("r"), default=[sys.stdin], nargs="*")
    return parser.parse_args()
 
 
if __name__ == "__main__":
    main(parse_args())

Have a good day! See you next time!

Hello, World!

ilovetux | 05 October, 2015 04:12

I am iLoveTux, an IT Consultant and Entrepreneur by trade, but a programmer at heart. Sometimes I am able to practice my craft at work, but other times I use other skills I have acquired simply to make myself more marketable. It's ok though because I love learning new things, but even more than that I love facing challenges head-on and coming out on top.

(NOTE: This blog mirrors my blog at https:ilovetux.github.io)

After several failed attempts at starting and continuing a blog, I am starting this one. I think, however, that this one will work because I am using my most beloved service to host it. GitHub and I go back like lawn-chairs although I've only recently made this account I've had others and I even spent a good deal of time as simply a consumer of content on GitHub. It is my goal here to provide some useful information, some humorous content and some insightful tips and tricks.

I am first and foremost a Python developer. I also dabble in Unix/Linux system administration, IBM DataPower development and administration, web development (I love making great APIs and Single Page Web Apps to consume them [although my UI design could use some work]). I also never shy away from Database (SQL and NoSQL alike) design, implementation and consumption. I love wrangling data. I've used Splunk for the longest time and now I am learning to love the Anaconda Python Distribution for bringing together what just might be the premier collection of software for use with analyzing, displaying and otherwise just making sense of data.

Well, enough about me (for now). I will head off to gather my thoughts and put together some good blog posts for you guys.

 
Accessible and Valid XHTML 1.0 Strict and CSS
Powered by LT - Design by BalearWeb