Programming for Geographical Information Analysts – Core Skills

The major element, code wise, this week is the brief introduction to GUIs. This is something to explore in your own time, but in way of essentials, the core of a GUI using TkInter is a "root" window and setting it running.

import tkinter root = tkinter.Tk() # Main window. w = tkinter.Canvas(root, width=200, height=200) w.pack() # Layout w.create_rectangle(0, 0, 200, 200, fill="blue") tkinter.mainloop() # Wait for interactions.

GUIs are asynchronous code; they don't run to completion instantly, they wait for the user to initiate new code executions. This is known as "Event Based Programming" as the code waits for events to respond to. In Python, this process is based on callbacks: where you pass a function into another, with the expectation that at some point the function will be run. In the below, for example, the menu has the label "Run Model" added, and the function ("command") run is "bound" to it. When the label is pressed, the run function is run.

import tkinter def run(): pass # Do nothing, but could, for example, run the model. root = tkinter.Tk() menu = tkinter.Menu(root) root.config(menu=menu) model_menu = tkinter.Menu(menu) menu.add_cascade(label="Model", menu=model_menu) model_menu.add_command(label="Run model", command=run) tkinter.mainloop()

Here's the full code integrating both: gui.py.

GUI design is hard, and needs careful thought and testing. At every stage when designing the GUI, think "is it obvious what this does?"; make all elements as simple as possible. Users learn by trying stuff - they rarely read manuals, so think carefully about what the default behavior of any function should be; hide complex functionality and the options to change defaults in 'Options' menus.

Most of all consult and test. When you think you have a bit users are interested in up and running, test its 'usability'. Sit your users down with the software and get them to play with it. It's useful to set them common tasks to do. See how long they take to do stuff, and what they do that's unexpected. Some companies use mirrored windows.

Remember, users don't make mistakes - it's your fault if you lead them the wrong way!

The second major element is making and processing webpages.

First up, we need to understand the how webpages are written, and their general structure.

The typical webpage source looks like:

<HTML> <HEAD> <TITLE>My first webpage</TITLE> </HEAD> <BODY> <P> This is some text<BR /> and a <A href="http://www.bbc.co.uk">link</A> </P> <IMG src="brushedsteel.jpg"></IMG> <BODY> </HTML>

If you're going to do any web scraping, you need to become familiar with HTML. You can find a basic tutorial here.

When we scrape the web, we need to identify components in our webpage we want to target. We may, for example, have a table like that below:

<HTML> <BODY> <TABLE id="datatable"> <TR> <TH>A</TH><TH>B</TH> </TR> <TR> <TD>1</TD><TD>2</TD> </TR> <TR> <TD>3</TD><TD>4</TD> </TR> </TABLE> <BODY> </HTML>

and want to pull out all of the second column data.

To do this, we need to identify the BODY, and look inside this to find the TABLE, then look in this to find each TR (table row), then in each row to find the second TD (table datacell).

This process is called navigating the "Document Object Model" (or "DOM"). We treat the webpage as if it were objects nested inside each other.

In addition to the main tag name, tags can have "attributes". You can see this with the A ("anchor") link tag, in the first page above – it has an href (hypertext reference) attribute showing where to link to, and the IMG ("image") tag, which has a src (source) attribute giving the image filename.

There are two attributes all tags can have that help with scraping. One is the class attribute. This can be in multiple tags of the same name to divide them up into types. The other is the id attribute. This has to be unique to a specific tag. So you might see, for example:

<TD class=topRow id=topLeft>1</TD> <TD class=topRow id=topRight>4</TD>

This helps when navigating the DOM; we can ask for all TDs of class topRow or a specific TD with id rightTop.

You can see, we've given our TABLE an id, as an example.


page = requests.get("http://www.geog.leeds.ac.uk/courses/computing/practicals/python/web/scraping-intro/table.html")

content = page.text


trs = table.find_all('tr')


for tr in trs:

    # Do something with the "tr" variable.

Note that even though the tags are in upper case in the file, when they are parsed, they get lowercased, so that's what we search for.

Geography Programming Courses

Key Ideas