All posts by Maksym Shyte

Grepping any type of file encoding in Python

Let’s take handling any encoding of files one step further.

We need to look for specific text in in files in a directory regardless of encoding.  Here is one way in Python.

#! /usr/bin/python
import sys
import os.path
import os
import re
import fnmatch

def DecodeBytes(byteArray, codecs=['utf-8', 'utf-16']):
  for codec in codecs:
    try:
      return byteArray.decode(codec)
    except:
      pass

def ReadLinesFromFile(filename):
  file = open(filename, "rb")
  rawbytes = file.read()
  file.close()
  content = DecodeBytes(rawbytes)
  if content is not None:
    return content.split(os.linesep)

# this came from http://stackoverflow.com/questions/1863236/grep-r-in-python
# with a substitution of ReadLinesFromFile and a file name match filter
def RecursiveGrep(pattern, dir, match):
  r = re.compile(pattern)
  for parent, dnames, fnames in os.walk(dir):
    fnames = fnmatch.filter(fnames, match)
    for fname in fnames:
      filename = os.path.join(parent, fname)
      if os.path.isfile(filename):
        lines = ReadLinesFromFile(filename)
        if lines is not None:
          idx = 0
          for line in lines:
            if r.search(line):
              yield filename + "|" + str(idx) + "|" + line.strip()	
              idx += 1

lines = RecursiveGrep("needle", "\yourpath", "*.cs")

The will recurse all subdirectories, looking in all .cs files to find needed returning the data in this format (pipe separated):

full file path|line number|line content

Very useful on Windows with multilingual files.

Getting lines of a file of any encoding type in Python

I really don’t want to know the encoding.  I only want the data.  In other words, I don’t want to think.  I don’t want to open notepad++ and convert between types of encoding.

My old standby doesn’t work on various file encodings that aren’t ansi (ascii, cp1252, whatever):

f = open("poo.txt", "r")
lines = f.readlines()
f.close()
for line in lines:
  dosomething(line)

I have had enough.  (I am also venturing into Python 3 as I have been on Python 2 forever but that is a different story.)

The following code will read a file of different encoding and split them into lines:

import os

def DecodeBytes(byteArray, codecs=['utf-8', 'utf-16']):
  for codec in codecs:
    try:
      return byteArray.decode(codec)
    except:
      pass

def ReadLinesFromFile(filename):
  file = open(filename, "rb")
  rawbytes = file.read()
  file.close()
  content = DecodeBytes(rawbytes)
  if content is not None:
    return content.split(os.linesep)

lines = ReadLinesFromFile("poo.txt")
for line in lines:
  dosomething(line)

If you need to add encodings, simply add them to the codecs default assignment (or make it more elegant as you deem).

 

Useful DLL utilities on Windows

A current project involves moving from a 32 bit to 64 bit system.  Some self contained exes are required to remain 32 bit while the rest of the system will move to 64 bit.  Some .NET assemblies are also “any cpu”.

As in all cases where the projects get complicated and you inherit code, it is easy to lose what gets installed where especially of different types (32 bit, 64 bit,  any-cpu).

DLL Information using Dumpbin.exe

To get the dump of all the DLL headers recursing all subdirectories, the following is useful in a command prompt.

for /f "tokens=*" %i in ('dir /s /b /a-d *.dll') do call dumpbin.exe /headers "%i"

If you use dumpbin.exe and need to move it to a target machine, you will also need to copy link.exe and mspdb100.dll.  (This version is Visual Studio 2010 and located in C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64.  I was targeting a 64 bit machine.)

Notice that a .NET assembly set to “any cpu” will look like a 32 bit assembly.

Part of the output looks like this:

Dump of file C:\turd.dll

PE signature found

File Type: DLL

FILE HEADER VALUES
14C machine (x86)
3 number of sections
4F6A184B time date stamp Wed Mar 21 14:04:59 2012
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
2102 characteristics
Executable
32 bit word machine
DLL

OPTIONAL HEADER VALUES
10B magic # (PE32)
8.00 linker version
2A000 size of code

.NET Assembly Type using Corflags.exe

An equally useful feature is to find out if an .NET assembly is 32 bit, 64 bit, or any cpu (or ILASM only) using corflags.exe.  (This version is located here on my machine: C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\NETFX 4.0 Tools.)

for /f "tokens=*" %i in ('dir /s /b /a-d *.dll') do call echo "%i" >> out.txt & corflags.exe /nologo "%i" >> out.txt

The output looks like this:

"C:\stinky.dll"  
Version   : v4.0.30319
CLR Header: 2.5
PE        : PE32
CorFlags  : 3
ILONLY    : 1
32BIT     : 1
Signed    : 0
"C:\smelly.dll"  
Version   : v2.0.50727
CLR Header: 2.5
PE        : PE32
CorFlags  : 1
ILONLY    : 1
32BIT     : 0
Signed    : 0
"C:\uhoh.dll"  
corflags : error CF008 : The specified file does not have a valid managed header

To interpret the results…

PE 32BIT Type of Assembly
PE32 0 Any CPU
PE32 1 x86
PE32+ 0 x64

Assembly Versions

To get the version of assemblies recursing all subdirectories, the following is useful in a power shell. This will not truncate the line.

ls -fi *.dll -r | % { $_.versioninfo } | Ft -autosize | out-string -width 4096

The output looks like this:

ProductVersion FileVersion  FileName
-------------- -----------  --------
1.0.0.0        1.0.0.0      C:\poo.dll
4.3.0.4        4.3.0.4      C:\caca.dll

So there you go. Some useful utilities.

Cross compiling with MinGW and crosstool-ng

I am sadistic.

I have given up.

I am leaving the instructions below but this is plan too difficult and I am moving on to Cygwin.  These instructions will definitely get cross tools built but it is a serious uphill battle to get it to compile a compiler.

My goal is to cross compile on MinGW (on Winderz) for a Linux target – both 64 bit Ubuntu 13.10 or an ARM (such as a Beagle Bone).  Why?  Because I am sadistic.  We covered that.

The real purpose is to take a Windows GUI that generates C code and compile it for a different platform (hence cross compiling).  Should be easy right?  well… let’s find out.  This is the steps on how I did it (in my own little hacky way).

The first step will be to use crosstools-ng and get it to run under MinGW.  This post will deal with that.  A subsequent post will deal with creating the cross compiler.

Note that as information on the web becomes quickly out of date, realize that this is the middle of March in 2014.

  1. Download and MinGW from here.  We will assume that you installed it on C:\MinGW.
  2. In the installation manager, mark the following packages to add and apply them.  The installer is really easy to use and should be self-explanatory.  (While applying them, it may be good to refill your coffee mug.)
      • mingw-developer-toolkit
      • mingw32-base
      • msys-wget
      • msys-gcc
      • msys-libtool
      • mingw32-pdcurses  This doesn’t work.  We will do this manually later.
      • msys-libregex (the dev package)
      • mingw32-gcc-v3-java
  3. This is not obvious now but later we will need Subversion for eglibc and gcj.exe (Java) for crosstool-ng.  First copy gcj.exe from /MinGW/bin to /MinGW/msys/1.0.  It is included with MinGW but not with msys.  Second, install subversion onto your Windows machine.  In C:\Program Files (x86)\Subversion\bin\, copy svn.exe to  /MinGW/msys/1.0.
  4. Open up your msys bash shell the batch file – C:\MinGW\msys\1.0\msys.bat.
  5. Download and build crosstool-ng following the steps here.  Substitute your version.  There are listed below with my version and some other things I did.

    wget http://crosstool-ng.org/download/crosstool-ng/crosstool-ng-1.19.0.tar.bz2
    tar xjf crosstool-ng-1.19.0.tar.bz2
    cd crosstool-ng-1.19.0
    ./configure --prefix=/home/maks/crosstool

  6. A couple of problems now.  First there was a strange tar error where it couldn’t copy a symbolic link.  The second time it ran fine.  The second issue is that gperf doesn’t exist.  We will build that from source as MinGW doesn’t include it.  Get that from GNU source here.  Grab the bz2 file, uncompress, and copy the source to your home directory.  Inside that directory, do the old standby.

    cd gperf-3.0.4
    ./configure --prefix=/usr
    make
    make install
    cd crosstool-ng-1.19.0
    ./configure --prefix=/home/maks/crosstool

  7. Now it can’t find the curses header.   Originally, I thought I had pdcurses package installed and was good to go.  The package is missing some important things like a DLL and header.  So onto source we go.  This page has all the information.  Even though it appears for mingw-w64, it works for 32 bit as well.  In the source directory, do the following.

    ./configure --enable-term-driver --enable-sp-funcs --prefix=/usr
    make
    make install 

  8. Surprise! More issues.  Here is a list of corrections then redo the previous step.
      • In ./ncurses/win32con/gettimeofday.c, add “#include <sys/time.h>”.  Also, change the gettime of day definition t the following:  “int gettimeofday(struct timeval *tv, struct timezone *tz)”.  This eliminates a duplicate definition error.
      • Side note: GetSystemTimeAsFileTime in gettimeofday seems to have a bug.  Whenever the fractional seconds becomes higher than 0.5, the integer seconds increment.  After all the testing I did but didn’t describe here, I can’t believe it is anything with long longs, or gcc, etc.  It has to be the underlying win32 api.
      • In ./ncurses/win32con/win_driver.c, add “#include <windows.h>” near the top.  Also add “#define ATTACH_PARENT_PROCESS (DWORD)-1” somewhere near the top.
      • In ./test/tclock.c, the double fraction declaration is inside an #if HAVE_GETTIMEOFDAY statement and it shouldn’t be.  Move it below the #endif.
  9. Now back in crosstools-ng, run ./configure –prefix=/home/maks/crosstool.  Yea!!!  It creates a make file.  Now on to making it.  Run “make” now.
  10. Next issue?  The gnu extension strcasestr doesn’t exist in MinGW.  The file ./kconfig/nconf.c uses it.  I am a bit surprised that the configure script didn’t check for that.  After much research, I decided to simply implement it inside the file that needed it.  Add the prototype to the top of the file:
    // Added for support in mingw. This ought to be checked and enabled with autotools.
    const char *strcasestr(const char *s1, const char *s2);

    Add the following to the buttom of the file.

    // Added for support in mingw. This ought to be checked and enabled with autotools.
    const char *strcasestr(const char *s1, const char *s2)
    {
     // if either pointer is null
     if (s1 == 0 || s2 == 0)
      return 0;
     // the length of the needle
     size_t n = strlen(s2);
     // iterate through the string
     while(*s1)
     // if the compare which is case insensitive is a match, return the pointer
     if(!strncmpi(s1++,s2,n))
      return (s1-1);
     // no match was found
     return 0;
    }

    One more thing. If you really wanted to get rid of all warnings, cast the first argument to every bzero call to a void pointer.

  11. Finally, a make and a make install will work.  You will see a bin and lib and other directories in /home/maks/crosstools (or wherever you placed your crosstool.

Next we are finally ready to build our cross compiler.  We will discuss this in our next post.  Why? Because we are sadistic.

C# and nullable value types

In C# you can declare nullable value types.  (This isn’t really what happens but the syntax looks like it.)  This is an unusual construct for those who come from other “lower-level” languages such as C++.

For example, the following declaration of “i” can not be set to null.

int i = 42;

However, in the following example, the following can be done.

int? j = 42;
int? k = null;

This blows my C++ brain.  In reality, “j” and “k” are not ints.  They are objects of type System.Nullable<T> as Microsoft describes here.

So, to safely coerce to int, you will need to check the value to be sure it is not null.  One approach is to use the HasValue method to make decisions or the GetValueOrDefault to simply get the default base type value.  For that matter, you can assign in a try-catch block but that seems so evil.

Finally, the cleanest type is the null-coalescing operator which is another foreign concept to C++ type brains.

Check out each example below.

int? i = null;
int j;

if (i.HasValue)
   j = (int) i;
else
   j = 42;

// j = 42 as i is null
Console.WriteLine("j = " + j.ToString());

// y is set to zero 
j = i.GetValueOrDefault();
// j = 0 as i is null
Console.WriteLine("j = " + j.ToString());

// an exception? really?  
try
{
    j = i.Value;
}
catch (InvalidOperationException e)
{
    Console.WriteLine("Why would anyone do this?");
}

// my favorite
j = i ?? 42;
Console.WriteLine("j = " + j.ToString());

The output you ask?  (That’s right, I heard you.)

j = 42
j = 0
Why would anyone do this?
j = 42

Using an exception is like having only one square of single-ply toilet paper.  If it is all you have, you use it.  However, we both know there are better methods.

Hysteria with PYROmania (special URIs)

I had a need to modify code written using Pyro so that objects on localhost could be exposed remotely.  I have never worked with Pyro before so I was in some hysteria.  I was ready to try though.

There is no definitive guide anywhere using the special names in the URI (actually I found something here and here later).  Love bites.

If you look at Pryo servers, the URI from the daemon is a very long string like the following:

PYRO://127.0.0.1:62100/c0a8006516bc7752e7526becdb059ce9

That is a rather long URI and the number changes on every service start up (obviously like a GUID or time based id).  So on the client side, is this my URI?  Is it too late for love?

Well, no. Here is a quick guide for how to use the special URI strings and you can be a Pyro Animal.

For name servers, you can use:

PYRONAME://<hostname>[:<port>]/<objectname>

For straight remote access (otherwise called the regular method):

PYROLOC://<hostname>[:<port>]/<objectname>

So, it isn’t as bad as it initially seems. No Foolin’.  Finally, Armageddon It.

C# WebRequests without Proxy or Delay

WebRequest objects in C# are useful creations when you want to script interaction with a web server in your application.  However, there are some common gotchas that I always have to look up to rectify.  So here are three of the most common in one handy location.

Nagle Algorithm

First, remove the use of the Nagle algorithm.  The Nagle algorithm is great for protocols like telnet where there is a chance of sending small amounts of data.  It is not so good for a protocol where packet sizes are intentionally small.  It will kill performance while queuing up bytes of data to send.  Use the following:

System.Net.ServicePointManager.UseNagleAlgorithm = false;

Expect 100 Continue

Expect 100 Continue is an HTTP 1.1 addition where a request can detect the readiness of a server before sending the a large body in a post.  The WebRequest object always sends an Expect: 100-continue in the header.  Not all web servers support handling this (i.e. lighttpd).  I suppose there is value to the 100 status code when posting large bodies but for most data transfers (i.e. SOAP, REST, XMLRPC, etc.), it doesn’t seem to be very useful.  Use the following to disable this.

System.Net.ServicePointManager.Expect100Continue = false;

WebRequest Proxy

By default, Windows will use the proxy settings internally set.   If you know your network is local, allowing the .NET framework to evaluate the default proxy settings can take unnecessary time.  You can set .NET to not use or look for any proxy by setting the following code:

WebRequest request = WebRequest.Create (resource);
    request.Proxy = null;

or

WebRequest.DefaultWebProxy = null;

Please remember to flush.