Category Archives: Python

Hysteria with PYROmania (special URIs)

I had a need to modify code written using Pyro so that objects on localhost could be exposed remotely.  I have never worked with Pyro before so I was in some hysteria.  I was ready to try though.

There is no definitive guide anywhere using the special names in the URI (actually I found something here and here later).  Love bites.

If you look at Pryo servers, the URI from the daemon is a very long string like the following:

PYRO://127.0.0.1:62100/c0a8006516bc7752e7526becdb059ce9

That is a rather long URI and the number changes on every service start up (obviously like a GUID or time based id).  So on the client side, is this my URI?  Is it too late for love?

Well, no. Here is a quick guide for how to use the special URI strings and you can be a Pyro Animal.

For name servers, you can use:

PYRONAME://<hostname>[:<port>]/<objectname>

For straight remote access (otherwise called the regular method):

PYROLOC://<hostname>[:<port>]/<objectname>

So, it isn’t as bad as it initially seems. No Foolin’.  Finally, Armageddon It.

Comparing huge files using Python

I had the need to compare two huge files.

None of the file compare tools that I tried could handle large files, so I decided to write my own compare utility.

Comparing large files is not a common case. I could have solved my issue by loading each file into a database, then using the excellent RedGate DataCompare  against the two tables. Heck for that matter – load both in one one table and do a GROUP BY. But I had several files and loading the database would have been tedious.

Use cases could include replacing an ETL process with a new process or upgrading a web service that generates large datasets. For me, it was a system migration of an ETL process.

The script was done ‘quick and dirty’, but really came in handy.

What this does is opens the two files, and starts reading them with a configurable line buffer (set to 10 currently). It looks to see if lines from the second file are in the buffer of the first file. It records misses and increases the buffer. It outputs misses for later review and will stop processing if there are too many misses (configurable).

That is it – easy

### Program to compare two large files
### fails fast – order of rows is important
### smart enough to look ahead for matching rows
fs2 = “C:/Temp/File2.txt”
fs1 = “C:/Temp/File1.txt”
ofs = “C:/Temp/diffb.txt”
maxerrors = 500
Lq1 = []
Lq2 = []
rcount1 = 0
rcount2 = 0
isFound = False
#setup – read first n rows
n = 10
f1pointer = n
f2pointer = n
f1 = open(fs1)
f2 = open(fs2)
of = open(ofs, ‘w’)
for x in range(0,n):
    Lq2.append(f2.readline())
# important stuff
errorsFound = 0
rowcounter = 0
goodcounter = 0
row = f1.readline()
while errorsFound < maxerrors and len(row) > 10:
        isFound = False
        if (rowcounter%10000 == 0) :
            print “.”,
        rowcounter = rowcounter + 1
        for y in range(0,len(Lq2)):
            if (row == Lq2[y]):
                isFound = True
                Lq2.pop(y)
                Lq2.append(f2.readline())
                break
        if  not isFound:
            errorsFound = errorsFound + 1
            of.write(row)
            for x in range(0,n):
                Lq2.append(f2.readline())
        row = f1.readline()
print “”, “done – rows processed: “, rowcounter
f1.close()
f2.close()
of.close()
print len(Lq2)