All posts by Maksym Shyte

Simple Python JSON server based on jsonrpclib

I needed a simple python JSON server executing in its own thread but that was easily extensible.   Let’s get right to the base class code (or super class for those who build down).

#! /usr/bin/python

import threading

import jsonrpclib
import jsonrpclib.SimpleJSONRPCServer

class JsonServerThread (threading.Thread):
  def __init__(self, host, port):
    threading.Thread.__init__(self)
    self.daemon = True
    self.stopServer = False
    self.host = host
    self.port = port
    self.server = None

  def _ping(self):
    pass

  def stop(self):
    self.stopServer = True
    jsonrpclib.Server("http://" + str(self.host) + ":" + str(self.port))._ping()
    self.join()

  def run(self):
    self.server = jsonrpclib.SimpleJSONRPCServer.SimpleJSONRPCServer((self.host, self.port))
    self.server.logRequests = False
    self.server.register_function(self._ping)

    self.addMethods()

    while not self.stopServer:
      self.server.handle_request()
    self.server = None

  # defined class definitions

  def addMethods(self):
    pass

So the idea is simple,   Derive a new class from this and implement the addMethods method and the methods themselves.

#! /usr/bin/python

import jsonServer

class JsonInterface(jsonServer.JsonServerThread):
  def __init__(self, host, port):
    jsonServer.JsonServerThread.__init__(self, host, port)
    self.directory = directory

  def addMethods(self):
    self.server.register_function(self.doOneThing)
    self.server.register_function(self.doAnother)

  def doOneThing(self, obj):
    return obj

  def doAnother(self):
    return "why am I doing something else?"

In the derived class, implement the methods and register them in addMethods.  That is all.  Now we can worry simply about implementation.  Be aware of any threading synchronization of exception handling.  Jsonrpclib takes care of exception handling as well and converts it into a JSON exception.

One last item of note.  In the base class, the stop method is interesting.  Since handle_request() is a blocking call in the thread, we need to set the “stop” flag and make a simple request.  The _ping method does this for us.  Then we join on the thread waiting for it to end gracefully.

The jsonrpclib is a very useful library and well done.  By the way, this example is for Python 2.7.  On Ubuntu 14.04, you can install this using “apt-get install python-jsonrpclib”.

Pulling Documents for Searching

In a prior post, I noted how to set up elasticsearch with apache2.  In this post, we will look at how to cache a set of files on your web server  from a windows share and index them.

To do this, we need to do the following steps:

  1. Initialize the index the first time.
  2. Mount a share.
  3. Rsync the data between the machines.
  4. Get the files that exist on the SMB share.
  5. Read what has been indexed.
  6. Diff the lists from steps 3 and 4.
  7. Index the new files on the share.
  8. Delete (the index and file) the files that no longer exist on the share.

By the way, there was a lot done in python 2.7 (as opposed to python 3x in some other posts I have).

Initialize the Index

The following script will “reset” the index and create it new.

#! /usr/bin/python

import httplib 
import binascii
import os
import glob
import socket

import hostinfo

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

def connInitialize(conn):
    print connRequest(conn, 'DELETE', hostinfo.INDEX)
    print connRequest(conn, 'PUT', hostinfo.INDEX, '{  "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}}') 
    print connRequest(conn, 'GET', '/_cluster/health?wait_for_status=green&pretty=1&timeout=5s' )
    print connRequest(conn, 'PUT', hostinfo.INDEX + '/attachment/_mapping', '{  "attachment" : {   "properties" : {      "file" : {        "type" : "attachment",        "fields" : {          "title" : { "store" : "yes" },          "file" : { "term_vector":"with_positions_offsets", "store":"yes" }        }      }    }  }}' )

def connRefresh(conn):
    print connRequest(conn, 'POST', '/_refresh')

socket.setdefaulttimeout(15)
conn = httplib.HTTPConnection(hostinfo.HOST)
connInitialize(conn)
connRefresh(conn)

Mount a SMB share

On Ubuntu, you will need to install cif-utils:  “sudo apt-get install cifs-utils”.

Once done, you can mount it by using the following command.  Choose your own mount point obviously and be prepared with your domain password.

sudo mount -t cifs //10.0.4.240/General /mnt/cifs -ousername=maksym.shyte,ro

Rsync Between Server and SMB Share

The easiest way to do this is to create a file list that you want to search for.  Then use that list to rsync with.  This leaves you with copied files with efficiency and a text file list of the files on the SMB share.

function addToList {
  find "$1" -name \*.pdf -o -name \*.doc -o -name \*.docx -o -name \*.xls -o -name \*.xlsx -o -name \*.ppt -o -name \*.pptx -o -name \*.txt | grep -v ".AppleDouble" | grep -v "~$" >> "$2"
}

cd /mnt/cifs

addToList . $currentPath/rsynclist.txt
#addToList ./Some\ Directory $currentPath/rsynclist.txt

rsync -av --files-from=rsynclist.txt /mnt/cifs /var/www/search/data

Read the Index

To read the index, the following script will pull the indexes out and write them to a file.  This will include the name of the document and the key.  You will need to take the step of revolving the path from the previous file list with this index as they are related by the source and destination directory passed to rsync.

#! /usr/bin/python

import httplib 
import json
import sys
import os
import codecs

import hostinfo

argc = len(sys.argv)
if argc != 2:
    print os.path.basename(sys.argv[0]), ""
    sys.exit(-1)

indexFileName = sys.argv[1]

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

conn = httplib.HTTPConnection(hostinfo.HOST)
data = json.loads(connRequest(conn, 'GET', hostinfo.INDEX + '/_search?search_type=scan&scroll=10m&size=10', '{"query":{"match_all" :{}}, "fields":["location"]}' ))

print data
total = data["hits"]["total"]

#scroll session id, used to request the next batch of data
scrollId = data["_scroll_id"]
counter = 0; 

data = json.loads(connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId))

#print data

f = codecs.open(indexFileName, "w", "utf8")

while len(data["hits"]["hits"]) > 0:
    for item in data["hits"]["hits"]: 
        f.write(item["fields"]["location"][0] + ',' + item["_id"] + '\n')
        f.flush()

    counter = counter + len(data["hits"]["hits"])
    print "Reading Index:", counter, "of", total

    scrollId = data["_scroll_id"]
    resp = connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId)
    #print resp
    data = json.loads(resp)

f.close()

Diff the File List and the Index List

Next we need to diff the two.  We want to know the files we need to index and the files we want to delete.  The following script does that (presuming that the lists have been modified to point at the same directory – i.e. /var/www/search/data).  Out comes an “add” text file and a “delete” text file.

#! /usr/bin/python

import sys
import os

argc = len(sys.argv)
if argc != 5:
    print os.path.basename(sys.argv[0]), "   "
    sys.exit(-1)

def createMap(filename):
    ret = {}
    f = open(filename)
    lines = f.readlines()
    f.close()
    for line in lines:
        line = line.replace('\n','')
        split = line.split(',', 1)
        key = split[0]
        ret[key] = line
    return ret

fileMap = createMap(sys.argv[1])
indexMap = createMap(sys.argv[2])

# if the entry is in fileMap but not indexMap, it goes into the add file
# if the entry is in indexMap but not fileMap, it goes into the delete file
add = {}

for key in fileMap:
    if indexMap.has_key(key):
        del indexMap[key]
    else:
        add[key] = fileMap[key]

f = open(sys.argv[3], "w")
for key in add:
    f.write(add[key] + '\n');
f.close()

f = open(sys.argv[4], "w")
for key in indexMap:
    f.write(indexMap[key] + '\n');
f.close()

Add to the Index

Next we iterate through all the files in the “add” list.

#! /usr/bin/python

import httplib 
import binascii
import sys
import os
import socket

import hostinfo

argc = len(sys.argv)
if argc != 3:
    print os.path.basename(sys.argv[0]), " "
    sys.exit(-1)

rootFsDir = sys.argv[2] 

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

def connInitialize(conn):
    print connRequest(conn, 'DELETE', hostinfo.INDEX)
    print connRequest(conn, 'PUT', hostinfo.INDEX, '{  "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}}') 
    print connRequest(conn, 'GET', '/_cluster/health?wait_for_status=green&pretty=1&timeout=5s' )
    print connRequest(conn, 'PUT', hostinfo.INDEX + '/attachment/_mapping', '{  "attachment" : {   "properties" : {      "file" : {        "type" : "attachment",        "fields" : {          "title" : { "store" : "yes" },          "file" : { "term_vector":"with_positions_offsets", "store":"yes" }        }      }    }  }}' )

def connRefresh(conn):
    print connRequest(conn, 'POST', '/_refresh')

def connAddFile(conn, filename, rootFsDir):
    title = os.path.basename(filename)
    location = filename[len(rootFsDir):]

    with open(filename, 'rb') as f:
        data = f.read()

    if len(data) > hostinfo.LARGEST_BASE64_ATTACHMENT:
        print 'Not indexing because the file is too large', len(data)
    else:
        print 'Indexing file size', len(data)
        base64Data = binascii.b2a_base64(data)[:-1]
        attachment = '{ "file":"' + base64Data + '", "title" : "' + title + '", "location" : "' + location + '" }'
        print connRequest(conn, 'POST', hostinfo.INDEX + '/attachment/', attachment)

socket.setdefaulttimeout(30)
conn = httplib.HTTPConnection(hostinfo.HOST)
#connInitialize(conn)

f = open(sys.argv[1])
lines = f.readlines()
f.close()

idx = 0

rootFsDir = rootFsDir + '/'

for line in lines:
    line = line.replace('\n', '')
    idx = idx + 1
    filename = rootFsDir + line
    print idx, filename
    try:
        connAddFile(conn, filename, rootFsDir)
    except Exception, e:
        print str(e)
        conn = httplib.HTTPConnection(hostinfo.HOST)  

connRefresh(conn)

Delete the Files Not Needed

Finally, we delete the index and physical files no longer needed.

#! /usr/bin/python

import httplib 
import binascii
import sys
import os
import socket

import hostinfo

argc = len(sys.argv)
if argc != 3:
    print os.path.basename(sys.argv[0]), " "
    sys.exit(-1)

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

def connRefresh(conn):
    print connRequest(conn, 'POST', '/_refresh')

def connDeleteFile(conn, index):
    print connRequest(conn, 'DELETE', hostinfo.INDEX + '/attachment/' + index)

socket.setdefaulttimeout(30)
conn = httplib.HTTPConnection(hostinfo.HOST)

f = open(sys.argv[1])
lines = f.readlines()
f.close()

idx = 0

for line in lines:
    line = line.replace('\n', '')
    idx = idx + 1
    split = line.split(',')
    filename = split[0]
    index = split[1]
    print "Delete:", idx, filename, index
    try:
        connDeleteFile(conn, index)
    except Exception, e:
        print str(e)
        conn = httplib.HTTPConnection(hostinfo.HOST)  

    try:
    	os.remove(sys.argv[2] + '/' + filename)    
    except:
        pass

connRefresh(conn)

There it is.  I have all these steps including resolving the path between the file list and the index list.  One further thing to note is that the hostinfo file referenced by the python scripts look like this:

#! /usr/bin/python

HOST = '127.0.0.1:9200'
SITE = ''

INDEX = SITE + '/basic'

LARGEST_BASE64_ATTACHMENT = 50000000

 

A Search Engine for Office Documents

Have you ever worked at a place where there was a mass of files and documents on  a share and even old timers forget where important documents are?

Search by file name stinks and SharePoint has been another excuse to dump stuff that gets lost.

So I decided to figure out an easy way to get a content search engine up looking through the files on a share.    I found a solution.  It isn’t pristine for these reasons.

  1. Browsers can’t link to files on a share for obvious security reasons.
  2. For reason one, the decision was made to copy searchable documents onto the web server.  This is time consuming to transfer and duplicates information but the documents are served successfully.
  3. For reason two, it would be possible to add an server plugin that reads and delivers a file on a share.  Just haven’t done that yet.

So we will start with what we have and consider changing it later.

The basis for this will be Ubuntu 12.04 LTS.  Why?  Because I have such a machine handy and it is 9 years old.  This will be based on all the wonderful work of elasticsearch and Lucene.

So, here are the steps.  Remember, this is a bit hacky.

  1. Install apache2.  (In the case of Ubuntu, it is “sudo apt-get install apache2”.)
  2. Install openjdk-7-jre-headless.  (“sudo apt-get isntall openjdk-7-jre-headless”).
  3. Download elasticsearch (from elasticsearch.org – the .com site takes you to pay-for products).  Because I am using Ubuntu, I thought I would use the apt repository.
  4. Follow the steps to start elasticsearch – in my case listed on the web site.  Be advised that elasticsearch binds to all interfaces tp a free port between 9200 and 9300.  We will assume that the port is 9200 as it is in my case.  However, it probably should only bind to a port on localhost or at least, the security should be evaluated to make sure it complies with what you need.
  5. We will need two plugins.  You can install them from you elasticsearch/bin location.  In my case it was /usr/share/elasticsearch/bin/plugin.
    bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/2.0.0
    bin/plugin -install de.spinscale/elasticsearch-plugin-suggest/1.0.1-2.0.0

    Restart elasticsearch. (“sudo service elasticsearch restart”).  You will also need to verify the versions of these plugins.

  6. For apache2, make sure to enable the proxy, proxy_http, and ssl modules.  On Ubuntu, the “a2enmod” is an easy utility to do this.
  7. In my Apache setup, I added a new file called “elasticsearch” inside /etc/apache2/conf.d.  (Note the 13.10 doesn’t use a conf.d directory.   It could be added to the bottom of apach2.conf although I am sure there is a more “pristine” location.)  The contents are below.
    <IfModule proxy_module>
    <IfModule proxy_http_module>
    
    <Proxy *>
    <Limit GET > 
        allow from all 
    </Limit>
    
    <Limit POST PUT DELETE>
        order deny,allow 
        deny from all 
    </Limit>
    </Proxy>
    
    ProxyPreserveHost On
    ProxyRequests Off
    LogLevel debug
    ProxyPass /es http://localhost:9200/
    ProxyPassReverse /es http://localhost:9200/
    
    </IfModule>
    </IfModule>

    The application depends on the /es directory under web root. This can be changed along with the web pages that use it.

  8. Restart apache2.  (“sudo service apache2 restart”)
  9. Download the HTML and Javascript for the search pages from here:  Search HTML and Javascript.  It uses jQuery and jQueryUI and AJAX to perform the searching and suggestions.  Unzip and place in the web directory where you want it.  For me, I wanted a search subdirectory so I placed my in /var/www/search.
  10. So, the last thing is show how to index the files.  I am a fan of python so this is python code making http requests to elasticsearch adding the information.  The script below deletes the index, recreates, and starts adding content to it – from files in a directory.
    #! /usr/bin/python
    
    import httplib 
    import binascii
    import os
    
    HOST = 'localhost:9200'
    INDEX = '/basic'
    
    def connRequest(conn, verb, url, body = None):
        if body == None:
            conn.request(verb, url)
        else:
            conn.request(verb, url, body)
        return conn.getresponse().read()
    
    def connAddFile(conn, filename, rootFsDir, httpPrefix):
        with open(filename, 'rb') as f:
            base64Data = binascii.b2a_base64(f.read())[:-1]
    
        title = os.path.basename(filename)
        location = httpPrefix + filename[len(rootFsDir):]
    
        attachment = '{ "file":"' + base64Data + '", "title" : "' + title + '", "location" : "' + location + '" }'
        print connRequest(conn, 'POST', INDEX + '/attachment/', attachment)
    
    conn = httplib.HTTPConnection(HOST)
    
    print connRequest(conn, 'DELETE', INDEX)
    
    print connRequest(conn, 'PUT', INDEX, '{  "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}}') 
    
    print connRequest(conn, 'GET', '/_cluster/health?wait_for_status=green&pretty=1&timeout=5s' )
    
    print connRequest(conn, 'PUT', INDEX + '/attachment/_mapping', '{  "attachment" : {   "properties" : {      "file" : {        "type" : "attachment",        "fields" : {          "title" : { "store" : "yes" },          "file" : { "term_vector":"with_positions_offsets", "store":"yes" }        }      }    }  }}' )
    
    # Add files here repeatedly
    rootFsDir = '/var/www/search/data/'
    searchDir = ''          # This is for recursion through the directories
    httpPrefix = 'data/'
    # Make this recursive some day
    for file in os.listdir(rootFsDir + searchDir):
        connAddFile(conn, rootFsDir + searchDir + file, rootFsDir, httpPrefix)
    
    print connRequest(conn, 'POST', '/_refresh')
  11. If you decide to get more creative and add only new files and delete the old ones, we need to understand how to get the list of existing files that are indexed.  Then you just have to correlate the current state of the files on disk with the index list.  This script gets the indexes and the files associated with them.
    #! /usr/bin/python
    
    import httplib 
    import json
    import sys
    import os
    
    import hostinfo
    
    argc = len(sys.argv)
    if argc != 2:
        print os.path.basename(sys.argv[0]), ""
        sys.exit(-1)
    
    indexFileName = sys.argv[1]
    
    def connRequest(conn, verb, url, body = None):
        if body == None:
            conn.request(verb, url)
        else:
            conn.request(verb, url, body)
        return conn.getresponse().read()
    
    conn = httplib.HTTPConnection(hostinfo.HOST)
    data = json.loads(connRequest(conn, 'GET', hostinfo.INDEX + '/_search?search_type=scan&scroll=10m&size=10', '{"query":{"match_all" :{}}, "fields":["location"]}' ))
    
    total = data["hits"]["total"]
    
    #scroll session id, used to request the next batch of data
    scrollId = data["_scroll_id"]
    counter = 0; 
    
    data = json.loads(connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId))
    
    f = open(indexFileName, 'w')
    
    while len(data["hits"]["hits"]) > 0:
        for item in data["hits"]["hits"]:
            f.write(item["fields"]["location"][0] + ',' + item["_id"] + '\n')
            f.flush()
    
        counter = counter + len(data["hits"]["hits"])
        print "Reading Index:", counter, "of", total
    
        scrollId = data["_scroll_id"]
        resp = connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId)
        #print resp
        data = json.loads(resp)
    
    f.close()
  12. To delete files, the python snippet looks like this where index is the id for the file we want indexing deleted for.
    def connDeleteFile(conn, index):
        print connRequest(conn, 'DELETE', hostinfo.INDEX + '/attachment/' + index)

So there we have it.  All we have to do figure out where we are getting our data from and copy it to the “data” directory.  One particular way I have done this is with rsync across an SMB share.

This by no means is meant to be a lesson on elasticsearch.  There can be some improvement here.

However, this is a quick way to set up searching documents for information you never knew existed.  (Side note:  I have had 10 ms search times across 2500 documents.)

 

Recycling a Third Party Application with System Tray Icon

I had a need to recycle a third party application that had a system tray icon.  The application controlled hardware  and would get into a funky state.

The application was titled the “user mode driver” but I’m not totally sure if it was the user mode driver framework that Microsoft touted with Vista.  The user mode driver (UMD) was really a bridge process between the Ethernet port and a COM (a.k.a. the older timer component object model) in-process DLL that resided in your program memory space.

The UMD also had a system tray component to it that needed a little cleanup when the application was killed.   The sytem tray icon was left behind.

This post is recycling other’s work that we will reference.  This post is about bringing it all together in C#.

There are three parts to this option.

  1. Stop the process
  2. Restart the process
  3. Clean up the system tray.

For this example though, we will assume that we know the full path to the process and that the process name is the base file name without extension.

Stop the Process

C# has a handy way to stop processes.

private void StopUserModeDriver(string userModeDriverPath)
{
  Process[] procs = null;

  try
  {
    procs = Process.GetProcessesByName(Path.GetFileNameWithoutExtension(userModeDriverPath));

    foreach (Process proc in procs)
    {
      proc.Kill();
      proc.WaitForExit(5000);
    }
  }
  finally
  {
    if (procs != null)
      foreach (Process proc in procs)
        proc.Dispose();
  }
}

Restart the Process

This one is simple.

private void StartUserModeDriver(string userModeDriverPath)
{
  Process.Start(userModeDriverPath);
}

Clean Up the System Tray

This code is present here and we will show it again on this post.

[StructLayout(LayoutKind.Sequential)]
public struct RECT
{
  public int left;
  public int top;
  public int right;
  public int bottom;
}
[DllImport("user32.dll")]
public static extern IntPtr FindWindow(string lpClassName, string lpWindowName);
[DllImport("user32.dll")]
public static extern IntPtr FindWindowEx(IntPtr hwndParent, IntPtr hwndChildAfter, string lpszClass, string lpszWindow);
[DllImport("user32.dll")]
public static extern bool GetClientRect(IntPtr hWnd, out RECT lpRect);
[DllImport("user32.dll")]
public static extern IntPtr SendMessage(IntPtr hWnd, uint msg, int wParam, int lParam);

private void RemoveOrphanedIconsFromSystemTray()
{
  IntPtr systemTrayContainerHandle = FindWindow("Shell_TrayWnd", null);
  IntPtr systemTrayHandle = FindWindowEx(systemTrayContainerHandle, IntPtr.Zero, "TrayNotifyWnd", null);
  IntPtr sysPagerHandle = FindWindowEx(systemTrayHandle, IntPtr.Zero, "SysPager", null);
  IntPtr notificationAreaHandle = FindWindowEx(sysPagerHandle, IntPtr.Zero, "ToolbarWindow32", "Notification Area");
  if (notificationAreaHandle == IntPtr.Zero)
  {
    notificationAreaHandle = FindWindowEx(sysPagerHandle, IntPtr.Zero, "ToolbarWindow32", "User Promoted Notification Area");
    IntPtr notifyIconOverflowWindowHandle = FindWindow("NotifyIconOverflowWindow", null);
    IntPtr overflowNotificationAreaHandle = FindWindowEx(notifyIconOverflowWindowHandle, IntPtr.Zero, "ToolbarWindow32", "Overflow Notification Area");
    RefreshSystemTrayArea(overflowNotificationAreaHandle);
  }
  RefreshSystemTrayArea(notificationAreaHandle);
}

private static void RefreshSystemTrayArea(IntPtr windowHandle)
{
  const uint wmMousemove = 0x0200;
  RECT rect;
  GetClientRect(windowHandle, out rect);
  for (var x = 0; x < rect.right; x += 5)
    for (var y = 0; y < rect.bottom; y += 5)
      SendMessage(windowHandle, wmMousemove, 0, (y << 16) + x);
}

Essentially, we are getting window handles to the notification area that is on your system tray and also the  overflow area introduced in Windows 7 (don’t know about Vista – does anyone remember Vista?).  That is the little arrow icon in the system tray that opens a little popup where all pestering but insignificant applications’ system tray icons live.

Do you remember how you have an orphaned system tray icon so you move your mouse over it to find magically disappears?  That is exactly what this code does.  With the Window handles to the system tray and overflow, we simply move our mouse repeatedly up and down and left to right.  We don’t actually move the cursor, just send the windows message.

There was another solution presented somewhere (on code project but I can’t find it now) that got information in the private bytes of the window allocations to determine if a process was still operating.  This approach was more pristine but did some memory allocation tricks in C# that made me nervous.  Sending mouse messages was certainly safer although not elegant.

C++ Speed Test with FPU and ints

I wanted to test the difference on modern hardware between floating point match and integer math. Here is my code (which was similar to C# code previously written).

// CSpeedTest.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include 
#include 
#include 

using namespace std;

#define FP_MULT
//#define FP_NEG
//#define INT_MULT
//#define INT_NEG

class StopWatch
{
    LARGE_INTEGER m_freq;
    LARGE_INTEGER m_startTime;
    LONGLONG m_totalTime;
public: 
    StopWatch() : m_totalTime(0L)
    {
        QueryPerformanceFrequency(&m_freq);
    }

    void Start()
    {
        QueryPerformanceCounter(&m_startTime);
    }

    void Stop()
    {
        LARGE_INTEGER stopTime;
        QueryPerformanceCounter(&stopTime);
        m_totalTime += (stopTime.QuadPart - m_startTime.QuadPart);
    }

    void Reset()
    {
        m_totalTime = 0L;
    }

    double ElapsedTime()
    {
        return (double)(m_totalTime) / (double)(m_freq.QuadPart);
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    #if defined(FP_MULT) || defined(FP_NEG)
    volatile double poo = 0.0;
    #endif
    #if defined(INT_MULT) || defined(INT_NEG)
    volatile int poo = 0;
    #endif

    StopWatch stopWatch;
    for (int idx = 0; idx < 1000000000; idx++)
    {
      stopWatch.Start();
      #if defined(FP_MULT)
        poo = -1.0 * poo;
      #endif
      #if defined(FP_NEG) || defined(INT_NEG)
        poo = -poo;
      #endif
      #if defined(INT_MULT)
        poo = -1 * poo;
      #endif
      stopWatch.Stop();
    }

    double elapsedTime = stopWatch.ElapsedTime();

    int minutes = elapsedTime / 60;
    int seconds = (int) (elapsedTime) % 60;
    int ms10 = (elapsedTime - int(elapsedTime)) * 100;

    cout << setfill('0') << setw(2) << minutes << ':' << seconds << ':' << ms10 << endl;;

    return 0;
}

The code was compiled as a console application for Win32 Debug so the variables would get “registered”.

The test machine is a Dell Precision M4800. The process is an Intel Core i7-4800MQ CPU at 2.70 GHz with 16GB ram. The OS is Windows 7 Professional 64 bit with SP1.

Here is the results. I have also included the assembler for the operation under test.

define time assembly
FP_MULT 7.32s fld qword ptr [__real@bff0000000000000 (0BE7938h)]; fmul qword ptr [poo]; fstp qword ptr [poo]
FP_NEG 7.56s fld qword ptr [poo]; fchs; fstp qword ptr [poo]
INT_MULT 7.58s mov eax,dword ptr [poo]; imul eax,eax,0FFFFFFFFh; mov dword ptr [poo],eax
INT_NEG 7.59s mov eax,dword ptr [poo]; neg eax; mov dword ptr [poo],eax

I actually don’t believe I have accomplished too much as the setup to call the timing functions actually take many, many more opcodes.  However, this was an interesting experiment and I do now have a cool C++ stopwatch on Windows for more extensive testing on much larger blocks of test code.

C# Speed Tests with FPU and ints

So I saw this sort of code inside a loop (doing inversion of data) in C# the other day;  assume x is a double:

dblValue = dblValue * -1.0d;

I wondered what the speed comparison was compared to this:

dblValue = -dblValue;

I expected the second to be faster.  I decided to find out. After testing floating point numbers, I also decided to try the same thing with integers just to see.  Here is my code set.

#define FP_MULT
//#define FP_NEG
//#define INT_MULT
//#define INT_NEG

using System;
using System.Diagnostics;

class Script
{
  [STAThread]
  static public void Main(string[] args)
  {
    #if FP_MULT || FP_NEG
    double poo = 0d;
    #endif
    #if INT_MULT || INT_NEG
    int poo = 0;
    #endif

    Stopwatch stopWatch = new Stopwatch();
    for (int idx = 0; idx < 1000000000; idx++)
    {
      stopWatch.Start();
      #if FP_MULT
        poo = -1.0d * poo;
      #endif
      #if FP_NEG || INT_NEG
        poo = -poo;
      #endif
      #if INT_MULT
        poo = -1 * poo;
      #endif
      stopWatch.Stop();
    }

    TimeSpan ts = stopWatch.Elapsed;

    string elapsedTime = String.Format("{0:00}:{1:00}.{2:00}",                                 ts.Minutes, ts.Seconds, ts.Milliseconds / 10);
    Console.WriteLine(elapsedTime);
  }
}

The program was compiled using .NET 4.0 64 bit framework. The exe was compiled to ILOnly verified using CorFlags.exe. This means the exe was running in 64 bit mode.

The test machine is a Dell Precision M4800. The process is an Intel Core i7-4800MQ CPU at 2.70 GHz with 16GB ram. The OS is Windows 7 Professional 64 bit with SP1.

Here is the results. I didn’t really average anything but everytime I ran every one of these tests, the values were always similar.  I have also included the IL for the operation under test.  (I used ILSpy.)

define time IL
FP_MULT 15.49s ldc.r8 -1; ldloc.0; mul; stloc.0
FP_NEG 15.35s ldloc.0; neg; stloc.0
INT_MULT 15.35s ldc.i4.m1; ldloc.0; mul; stloc.0
INT_NEG 15.43s ldloc.0; neg; stloc.0

It has been awhile since I have evaluated floating point and integer math but am impressed that the timing is very similar.

I think I may try this on the same machine using a simple C++ program and performance counters to see the results and dive deeper into this.

Follow up note:  I now don’t believe I accurate measured anything as the stopwatch opcodes were likely more plentiful then the code under test.  However, it was an interesting experiment and we learned about the stopwatch in .NET.

Using crosstool-ng and Cygwin

My goal is to cross compile on Cygwin (on Winderz) for a Linux target – both 64 bit Ubuntu 13.10 or an ARM (such as a Beagle Bone). I sadistically thought that this could be done in MinGW.  Two words: Um, oops.

The real purpose is to take a Windows GUI that generates C code and compile it for a different platform (hence cross compiling).  These are my steps which are based on this guy’s post.

Note that as information on the web becomes quickly out of date, realize that this is the end of March in 2014.

  1. Before you begin, it is imperative to set your file system to be case sensitive in Windows.  Both the kernel headers and C library use file names with the same case insensitive name but different case sensitive name.  Open regedit.exe and set the following to 0.
    HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\obcaseinsensitive

    Then reboot.

  2. Download and Cygwin from here.  We will assume that you installed it on C:\cygwin.  I am using the Cygwin 2.844 32 bit version as the compiler being built because it will run on 32 bit or 64 bit Windows.
  3. When you run setup, you will get a nice GUI to choose your packages.  If you ever want to add or remove a package, you run setup again (seems counter-intuitive on Windows).  Take the defaults and add the following packages (not all may be required but it didn’t hurt).
      • Devel/gperf
      • Devel/bison
      • Devel/flex
      • Devel/patch
      • Devel/make: The GNU version…
      • Devel/automake
      • Devel/libtool
      • Devel/subversion
      • Devel/gcc-core
      • Devel/gcc-g++
      • Devel/catgets
      • Web/wget
      • Libs/libncursesw-devel
      • Libs/libncurses-devel
      • Libs/gettext
      • Libs/libexpat-devel
  4. Open up your cygwin terminal.  If you have a shortcut on your desktop or in your start menu, use that.  If not, the shortcut contains the following target:
    c:\cygwin\bin\mintty.exe -i /Cygwin-Terminal.ico -
  5. In the terminal, download and build crosstool-ng following the steps here.  Substitute your version.  There are listed below with my version and some other things I did.
    wget http://crosstool-ng.org/download/crosstool-ng/crosstool-ng-1.19.0.tar.bz2
    tar xjf crosstool-ng-1.19.0.tar.bz2
    cd crosstool-ng-1.19.0
    ./bootstrap
    ./configure --prefix=/home/maks/crosstool
  6. There is one issue with curses.  In crosstool-ng-1.19.0/kconfig/nconf.c, there is the line “ESCDELAY = 1;”.  Swap this line with “set_escdelay(1);”  (A patch for this is listed here.  I did not apply the other two patches and had success building.)
  7. After making the previous correction,  we can make and install.
    make
    make install
  8. To make life easier, export your path to include /home/maks/crosstool/bin substituting your home directory.  I added this to my .bashrc so I wouldn’t have to think about it again.
    export PATH="${PATH}:/home/maks/crosstool/bin"
  9. This is where the patching begins.  Do the following.
    mkdir /usr/include/linux
    cp /usr/include/asm/types.h /usr/include/linux

    Then edit /usr/include/linux/types.h and include the following:

    typedef __signed__ long long __s64;
    typedef unsigned long long __u64;
  10. Make a new directory.  Since I wanted a 64 bit compiler for Linux, I did the following.  Adding the src directory seemingly allows the tarballs to be saved.  (This is not used and seems like a bug in the scripts.)  
  11. mkdir ~/src
    mkdir ~/linux64
    cd ~/linux64
    ct-ng i686-nptl-linux-gnu
    ct-ng menuconfig
  12. In menuconfig, I updated everything to the latest compiler, libraries, 64 bit, eglibc (which Ubuntu uses), etc.  If you want to cheat with menuconfig, use my config.  Simply copy the text into a .config file in the linux64 directory.
  13. You can start building with the following.  I recommend that you read the remainder of the post first.  These are tips that may help out.
    ct-ng build
  14. While building the kernel headers, I came to realize that Cygwin doesn’t have enough elf headers to be successful.  I applied the patch found here.  Make sure the patch applies correctly.  I had some issues.  Also, I had to edit /usr/include/sys/elf_common.h so that R_X86_64_JMP_SLOT was spelled R_X86_64_JUMP_SLOT (only that define).
  15. It turns out that for the latest kernel, a new version of make is needed than what comes with Cygwin.  In menuconfig, add make to the list of companion tools.
  16. Make sure you are not trying to build anything statically.  The final build of the compiler will not succeed.
  17. When you get passed installation of first pass of gcc, you are probably well on your way.  It will take around 2 to 3 hours to fully compile.  You may want to turn off anti-virus protection during this time.
  18. If you come across errors, you can restart ct-ng building where it left off by selecting “Paths and misc options / Debug crosstool-NG / Save intermediate steps.  Then to restart, run
    ct-ng list-steps
    ct-ng <last successful step name>+
  19. I never got D.U.M.A to actually build.  I lost patience trying to figure it out.  However, I never really needed a memory overrun checker in my case.

So that is that.  I compiled a C and C++ program on Windows and ran the binaries on Linux 64 bit Ubuntu 13.10.

CS-Script – C# scripting

A quick shout out is in order to Oleg Shilo.  The CS-script is fantastic for basic scripting needs or testing behaviors of C# and .NET.  Oleg has also included a plugin for Notepad++.

It also can be used to generate executable “scripts” to make useful utilities without bringing up an entire development environment and the overhead of projects and solutions.

For me personally, it will never really replace Python but it is good to know that there are alternatives for Windows based development when Python is not an option.  (Yes, those times do actually exist for some of us.)

More poo in the toolbox.

Notes on XML Serialization in C#

The poo crew was having some trouble understanding the behavior of XML serialization on .NET.  So we will add some clarity.

We wanted a serializer where default values could be set in code when reading older serialized XML and all tags were written regardless of “default value”.  This way, a human could inspect the XML and know that the properties they see are all of them, at the time of serialization.

XML Serializer

XmlSerializer has been there since the beginning of .NET time (or at least 1.1).  Here are some characteristics of XmlSerializer.

  1. All public properties from a public object are written using XmlSerializer as long as the [System.ComponentModel.DefaultValueAttribute(x)] is not attributed to the property.  This also can look like [DefaultValue(x)] and it is the same attribute.
  2. For strings, include [XmlElement(IsNullable = true)] as a decorator if you want the tag specifically in the XML and the string is not assigned.
  3. The default construction is performed when deserializing.  This is true for setting values in the constructor or assigning at declaration.

It is important to note that [DefaultValue()] has a second purpose.  Property grids use this to know whether or not to bold a value in the UI.  If the value = default value, the text is not bold.  If value != default value, the text is bold. That is all it does.  It absolutely does not change the class member no matter what your friends say.

Data Contract Serializer

This was added to the framework around .NET 3.0 (if we can believe Microsoft).  Here are the high points:

  1. All properties with [DataMember] appear to be written regardless of default values are set or not.  Why it didn’t work with SpiralToPolarPersistedData is something to look into.
  2. Constructors are not called when deserializing.  This is true for setting values in constructors or assigning at declaration.
  3. The only way to guarantee a default value is to assign the values in a method decorated with [OnDeserializing].  A common pattern is to call the default method assigning from the constructor and from the OnDeserializing method (assuming the default method isn’t overrideable).
  4. If you do not include OnDeserializing, the values in the class are the type’s default values regardless of construction or declaration.

XML Serializer Example Code

using System;
using System.IO;
using System.Xml;
using System.Xml.Serialization;

public class Script
{ 

  public class Record
  {
    private double n1;
    private double n2 = 100;
    private string operation;
    private double result;

    internal Record() 
    { 
      //n2 = 100;
    }

    internal Record(double n1, double n2, string operation, double result)
    {
      this.n1 = n1;
      this.n2 = n2;
      this.operation = operation;
      this.result = result;
    }

    public double OperandNumberOne
    {
      get { return n1; }
      set { n1 = value; }
    }

    public double OperandNumberTwo
    {
      get { return n2; }
      set { n2 = value; }
    }

    [XmlElement(IsNullable = true)]
    public string Operation
    {
      get { return operation; }
      set { operation = value; }
    }

    public double Result
    {
      get { return result; }
      set { result = value; }
    }

    public override string ToString()
    {
      return string.Format("Record: {0} {1} {2} = {3}", n1, operation, n2, result);
    }
  }

  static public void Main(string[] args)
  {
    Record record0 = new Record();
    Console.WriteLine(record0.ToString());

    Record record1 = new Record(1, 2, "+", 3);

    XmlSerializer serializer = new XmlSerializer(typeof(Record));

    using (FileStream stream = File.Open("test.xml", FileMode.Create))
    {
      serializer.Serialize(stream, record1);
    }

    Console.WriteLine("Press any key...");
    Console.ReadKey(false);

    using (FileStream stream = File.Open("test.xml", FileMode.Open))
    {
      Record record2 = (Record) serializer.Deserialize(stream);
      Console.WriteLine(record2.ToString());
    }
  }   
}

Data Contract Serializer Example Code

using System;
using System.Runtime.Serialization;
using System.IO;
using System.Xml;

public class Script
{ 

  [DataContract]
  internal class Record
  {
    private double n1;
    private double n2; // = 100;
    private string operation;
    private double result;

    internal Record() 
    { 
      // n2 = 100; */ 
      SetDefaults();
    }

    [OnDeserializing]
    private void OnDeserializing(StreamingContext context)
    {
      SetDefaults();
    }

    private void SetDefaults()
    {
      n2 = 100;
    }

    internal Record(double n1, double n2, string operation, double result)
    {
      this.n1 = n1;
      this.n2 = n2;
      this.operation = operation;
      this.result = result;
    }

    [DataMember]
    internal double OperandNumberOne
    {
      get { return n1; }
      set { n1 = value; }
    }

    [DataMember]
    internal double OperandNumberTwo
    {
      get { return n2; }
      set { n2 = value; }
    }

    [DataMember]
    internal string Operation
    {
      get { return operation; }
      set { operation = value; }
    }

    [DataMember]
    internal double Result
    {
      get { return result; }
      set { result = value; }
    }

    public override string ToString()
    {
      return string.Format("Record: {0} {1} {2} = {3}", n1, operation, n2, result);
    }
  }

  static public void Main(string[] args)
  {
    Record record0 = new Record();
    Console.WriteLine(record0.ToString());

    Record record1 = new Record(1, 2, "+", 3);

    DataContractSerializer serializer = new DataContractSerializer(typeof(Record));

    using (FileStream stream = File.Open("test.xml", FileMode.Create))
    {
      serializer.WriteObject(stream, record1);
    }

    Console.WriteLine("Press any key...");
    Console.ReadKey(false);

    using (FileStream stream = File.Open("test.xml", FileMode.Open))
    {
      XmlDictionaryReader reader = XmlDictionaryReader.CreateTextReader(stream, new XmlDictionaryReaderQuotas());
      Record record2 = (Record) serializer.ReadObject(reader, true);
      Console.WriteLine(record2.ToString());
    }
  }
}

Using crosstool-ng on MinGW

Hello sadistic friends.

I have given up.

I am leaving the instructions below but this is plan too difficult and I am moving on to Cygwin.  This is will remain here for simply…  I don’t know…  warnings to others that hair loss is not a good trade-off for getting this to work under MinGW.

We are going to try to build a compiler on Winderz now.  This is a follow up to installing crosstool-ng on MinGW on Winderz.  You should have MinGW already installed.

  1. Open up your msys bash shell the batch file – C:\MinGW\msys\1.0\msys.bat as administrator.  We will assume that you installed MinGW on C:\MinGW.  We will be working from the instructions here.  I installed mine at /home/maks/crosstool.  Always, I repeat, always run the terminal as administrator or some of the decompression of packages will not occur due to permission errors.
  2. Export your tool location by typing
    export PATH="${PATH}:/home/maks/crosstool/bin"

    Substitute your own path because I don’t think your name is Maks.  Now you should be able to run ct-ng now.  In fact, if you create a .profile file in your home directory that includes this line, you will never need to type it again.

  3. Make a directory for your cross compiler.  I called mine linux64 and I placed it in my home directory.  Remember your name is not Maks.  I decided that I am going to build an x86_64_unknown-linux-gnu because I want it to run on an Ubuntu built server 13.10. So in the linux64, type
    ct-ng x86_64-unknown-linux-gnu

    You can read about using ct-ng from the link above.  Then through

    ct-ng menuconfig

    I moved to all the latest versions of everything, eglibc, the 3.10 kernel, binutils, etc.  Also, make sure that your compiler is not linking statically.  This doesn’t work on Windows (and I haven’t tracked down why).

  4. Ok, now run
    ct-ng build

    You will get errors.  We will work through them.

  5. The first error will be that the filesystem is not case sensitive.  We will remedy this by adding the key to the registry.
    HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\obcaseinsensitive

    Set this value to zero.

  6. The second will be that the OS  MINGW32_NT-6.2 is not supported.  We will add that.  Edit ~/crosstool/lib/ct-ng.1.19.0/scripts/functions and look for a function called CT_DoForceRmdir.  Find the case statement and add MINGW32* to the line.  It should look like this.
    Linux|CYGIN*|MINGW32*)
  7. You may need to create a source directory where all of the tarballs are cached.  Simply
    mkdir ~/src
  8. The next error relates to downloading the tarballs and the use of certificates.  I determined this by looking at build.log.  The utility wget is attempting to validate certificates.  For now, we will simply remote the check.   Edit ~/crosstool/lib/ct-ng.1.19.0/scripts/functions and look for CT_DoGetFile.  On the line that starts with “if CT_DoExecLog ALL wget”, add –no-check-certificate to the command line.  Be mindful for split lines in script using the \.
  9. Oh geez, there is way too much.  Headers aren’t there… ugh.  Too much.  I have given up.  I am going to now try my hand at Cygwin.