Hysteria with PYROmania (special URIs)

I had a need to modify code written using Pyro so that objects on localhost could be exposed remotely.  I have never worked with Pyro before so I was in some hysteria.  I was ready to try though.

There is no definitive guide anywhere using the special names in the URI (actually I found something here and here later).  Love bites.

If you look at Pryo servers, the URI from the daemon is a very long string like the following:

PYRO://127.0.0.1:62100/c0a8006516bc7752e7526becdb059ce9

That is a rather long URI and the number changes on every service start up (obviously like a GUID or time based id).  So on the client side, is this my URI?  Is it too late for love?

Well, no. Here is a quick guide for how to use the special URI strings and you can be a Pyro Animal.

For name servers, you can use:

PYRONAME://<hostname>[:<port>]/<objectname>

For straight remote access (otherwise called the regular method):

PYROLOC://<hostname>[:<port>]/<objectname>

So, it isn’t as bad as it initially seems. No Foolin’.  Finally, Armageddon It.

Comparing huge files using Python

I had the need to compare two huge files.

None of the file compare tools that I tried could handle large files, so I decided to write my own compare utility.

Comparing large files is not a common case. I could have solved my issue by loading each file into a database, then using the excellent RedGate DataCompare  against the two tables. Heck for that matter – load both in one one table and do a GROUP BY. But I had several files and loading the database would have been tedious.

Use cases could include replacing an ETL process with a new process or upgrading a web service that generates large datasets. For me, it was a system migration of an ETL process.

The script was done ‘quick and dirty’, but really came in handy.

What this does is opens the two files, and starts reading them with a configurable line buffer (set to 10 currently). It looks to see if lines from the second file are in the buffer of the first file. It records misses and increases the buffer. It outputs misses for later review and will stop processing if there are too many misses (configurable).

That is it – easy

### Program to compare two large files
### fails fast – order of rows is important
### smart enough to look ahead for matching rows
fs2 = “C:/Temp/File2.txt”
fs1 = “C:/Temp/File1.txt”
ofs = “C:/Temp/diffb.txt”
maxerrors = 500
Lq1 = []
Lq2 = []
rcount1 = 0
rcount2 = 0
isFound = False
#setup – read first n rows
n = 10
f1pointer = n
f2pointer = n
f1 = open(fs1)
f2 = open(fs2)
of = open(ofs, ‘w’)
for x in range(0,n):
    Lq2.append(f2.readline())
# important stuff
errorsFound = 0
rowcounter = 0
goodcounter = 0
row = f1.readline()
while errorsFound < maxerrors and len(row) > 10:
        isFound = False
        if (rowcounter%10000 == 0) :
            print “.”,
        rowcounter = rowcounter + 1
        for y in range(0,len(Lq2)):
            if (row == Lq2[y]):
                isFound = True
                Lq2.pop(y)
                Lq2.append(f2.readline())
                break
        if  not isFound:
            errorsFound = errorsFound + 1
            of.write(row)
            for x in range(0,n):
                Lq2.append(f2.readline())
        row = f1.readline()
print “”, “done – rows processed: “, rowcounter
f1.close()
f2.close()
of.close()
print len(Lq2)

 

 

 

 

Recursive SQL query

Have you ever had to do a query on a table that joins to itself to form a hierarchy? For example the employee / manager relationships where the manager has a manager up to the top level of management.

My recent case involved categories of things where the categories could be nested to any level. I needed to find the oldest parent and the youngest parent for each leaf on the tree.

The lazy way (and often the quickest) is to write a set of statements with different levels of joins, then use your knowledge of the data to pull out what you need.

But what if you want to do in a way to impress the boss?

A recursive query will do the trick.

To get started, write a fairly simple query which we will call the Anchor query. This should get back the top level (root) of information. Then you JOIN this onto another query that does the recursion.

It is a very good idea to set a limit on the recursion so you don’t bring your SQL Server down. Actually – I think it defaults to 10,000 levels of recursion. But still, better to set a reasonable limit.

Here is an example:

 with CatHeirarchy (ProductID, CategoryID, ParentCategoryID, Name1, Name2, Level, TopParentName)
 as (
 -- Anchor definition
 select PCM.ProductID, PCM.CategoryID, C.ParentCategoryID,
 cast(C.Name as nvarchar(90)) AS Name1, Cast(C.Name as nvarchar(90)) as Name2, 0 as Level,
 CAST ('' as nvarchar(90)) as TopParentName
 from Product_Category_Map PCM
 inner join Category C on C.Id = PCM.CategoryId
 inner join Product P on P.Id = PCM.ProductId
 Union ALL
 -- recursive
 select A.ProductID, A.CategoryID, R.ParentCategoryID, cast(R.Name as nvarchar(90)) as Name1,
 CAST( A.Name2 as nvarchar(90)) as Name2, Level + 1,
 A.Name1
 from CatHeirarchy as A
 Inner Join Category as R
 on A.ParentCategoryID = R.ID
 )
select distinct CategoryID, Name2, TopParentName
 from CatHeirarchy ch
 , (Select MAX(Level) AS Level, ProductID
 from CatHeirarchy
 GROUP by ProductID) maxresults
 where ch.ProductID = maxresults.ProductID
 and ch.Level = maxresults.Level
 order by TopParentName, Name2
 OPTION (Maxrecursion 30)
 GO

C# WebRequests without Proxy or Delay

WebRequest objects in C# are useful creations when you want to script interaction with a web server in your application.  However, there are some common gotchas that I always have to look up to rectify.  So here are three of the most common in one handy location.

Nagle Algorithm

First, remove the use of the Nagle algorithm.  The Nagle algorithm is great for protocols like telnet where there is a chance of sending small amounts of data.  It is not so good for a protocol where packet sizes are intentionally small.  It will kill performance while queuing up bytes of data to send.  Use the following:

System.Net.ServicePointManager.UseNagleAlgorithm = false;

Expect 100 Continue

Expect 100 Continue is an HTTP 1.1 addition where a request can detect the readiness of a server before sending the a large body in a post.  The WebRequest object always sends an Expect: 100-continue in the header.  Not all web servers support handling this (i.e. lighttpd).  I suppose there is value to the 100 status code when posting large bodies but for most data transfers (i.e. SOAP, REST, XMLRPC, etc.), it doesn’t seem to be very useful.  Use the following to disable this.

System.Net.ServicePointManager.Expect100Continue = false;

WebRequest Proxy

By default, Windows will use the proxy settings internally set.   If you know your network is local, allowing the .NET framework to evaluate the default proxy settings can take unnecessary time.  You can set .NET to not use or look for any proxy by setting the following code:

WebRequest request = WebRequest.Create (resource);
    request.Proxy = null;

or

WebRequest.DefaultWebProxy = null;

Please remember to flush.