Stupid Simple File Server from Spare Parts

June 25, 2014 beans Leave a comment

Another installment of things I like because the just work.

I frequently have the need to spin up a file server, either for temporary use or for longer term storage.
Recently I was migrating a system from one place to another and needed a large amount of fast temporary storage.
At home I have a file server for mass storage of music, videos etc.

There are lots of solutions to this. Could always just use an old Win XP box, or set up a Linux server using your favorite distro. Then there are free software systems, FreeNAS is a good example.

One I like is Server Elements.
They have a line of products including one that boots from a floppy! Hardware requirements for all of the products are very modest. Basically take some old PC, stuff it full of old disks, create a bootable CD, or Thumb Drive and off you go. Prices range from $10 – $35 for the 64bit product with some media streaming capabilities.

For quick set up – Server Elements is really great and worth the small price.
If you want more features, FreeNAS is a very good choice.

Here is an excellent article on the topic.
http://www.smbitjournal.com/2012/04/choosing-an-open-storage-operating-system/

Things we like

Swiss Army Knife of SMTP Servers

June 25, 2014 beans Leave a comment

Just wanted to give a shout out to my favorite SMTP Server.
When you find yourself needing a robust, easy to configure and support SMTP IMAP/ POP3 server that can handle about any messaging need you could think of … check out MDaemon from the folks at Alt-N Technologies.

How good is it? Well, not that long ago, Research In Motion (RIM), you know, the BlackBerry people, bought? the company.
(not positive about the change in ownership as I don’t work for the company)
During this time BlackBerry support was added. Then the market turned, and Alt-N became independent again.

I’ve been using MDaemon for many years now in several environments. I’ve tried others, but keep going back to this one. It just works. Rarely do I find a messaging problem that it doesn’t handle. You *nix fans will even like it. Although it runs on Windows machines, it has a *nix feel to it as everything is kept in files.

beans

Python

Simple Python JSON server based on jsonrpclib

June 24, 2014 Maksym Shyte Leave a comment

I needed a simple python JSON server executing in its own thread but that was easily extensible. Let’s get right to the base class code (or super class for those who build down).

#! /usr/bin/python

import threading

import jsonrpclib
import jsonrpclib.SimpleJSONRPCServer

class JsonServerThread (threading.Thread):
  def __init__(self, host, port):
    threading.Thread.__init__(self)
    self.daemon = True
    self.stopServer = False
    self.host = host
    self.port = port
    self.server = None

  def _ping(self):
    pass

  def stop(self):
    self.stopServer = True
    jsonrpclib.Server("http://" + str(self.host) + ":" + str(self.port))._ping()
    self.join()

  def run(self):
    self.server = jsonrpclib.SimpleJSONRPCServer.SimpleJSONRPCServer((self.host, self.port))
    self.server.logRequests = False
    self.server.register_function(self._ping)

    self.addMethods()

    while not self.stopServer:
      self.server.handle_request()
    self.server = None

  # defined class definitions

  def addMethods(self):
    pass

So the idea is simple, Derive a new class from this and implement the addMethods method and the methods themselves.

#! /usr/bin/python

import jsonServer

class JsonInterface(jsonServer.JsonServerThread):
  def __init__(self, host, port):
    jsonServer.JsonServerThread.__init__(self, host, port)
    self.directory = directory

  def addMethods(self):
    self.server.register_function(self.doOneThing)
    self.server.register_function(self.doAnother)

  def doOneThing(self, obj):
    return obj

  def doAnother(self):
    return "why am I doing something else?"

In the derived class, implement the methods and register them in addMethods. That is all. Now we can worry simply about implementation. Be aware of any threading synchronization of exception handling. Jsonrpclib takes care of exception handling as well and converts it into a JSON exception.

One last item of note. In the base class, the stop method is interesting. Since handle_request() is a blocking call in the thread, we need to set the “stop” flag and make a simple request. The _ping method does this for us. Then we join on the thread waiting for it to end gracefully.

The jsonrpclib is a very useful library and well done. By the way, this example is for Python 2.7. On Ubuntu 14.04, you can install this using “apt-get install python-jsonrpclib”.

Python, Searching, Utilities

Pulling Documents for Searching

May 29, 2014 Maksym Shyte Leave a comment

In a prior post, I noted how to set up elasticsearch with apache2. In this post, we will look at how to cache a set of files on your web server from a windows share and index them.

To do this, we need to do the following steps:

Initialize the index the first time.
Mount a share.
Rsync the data between the machines.
Get the files that exist on the SMB share.
Read what has been indexed.
Diff the lists from steps 3 and 4.
Index the new files on the share.
Delete (the index and file) the files that no longer exist on the share.

By the way, there was a lot done in python 2.7 (as opposed to python 3x in some other posts I have).

Initialize the Index

The following script will “reset” the index and create it new.

#! /usr/bin/python

import httplib 
import binascii
import os
import glob
import socket

import hostinfo

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

def connInitialize(conn):
    print connRequest(conn, 'DELETE', hostinfo.INDEX)
    print connRequest(conn, 'PUT', hostinfo.INDEX, '{  "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}}') 
    print connRequest(conn, 'GET', '/_cluster/health?wait_for_status=green&pretty=1&timeout=5s' )
    print connRequest(conn, 'PUT', hostinfo.INDEX + '/attachment/_mapping', '{  "attachment" : {   "properties" : {      "file" : {        "type" : "attachment",        "fields" : {          "title" : { "store" : "yes" },          "file" : { "term_vector":"with_positions_offsets", "store":"yes" }        }      }    }  }}' )

def connRefresh(conn):
    print connRequest(conn, 'POST', '/_refresh')

socket.setdefaulttimeout(15)
conn = httplib.HTTPConnection(hostinfo.HOST)
connInitialize(conn)
connRefresh(conn)

Mount a SMB share

On Ubuntu, you will need to install cif-utils: “sudo apt-get install cifs-utils”.

Once done, you can mount it by using the following command. Choose your own mount point obviously and be prepared with your domain password.

sudo mount -t cifs //10.0.4.240/General /mnt/cifs -ousername=maksym.shyte,ro

Rsync Between Server and SMB Share

The easiest way to do this is to create a file list that you want to search for. Then use that list to rsync with. This leaves you with copied files with efficiency and a text file list of the files on the SMB share.

function addToList {
  find "$1" -name \*.pdf -o -name \*.doc -o -name \*.docx -o -name \*.xls -o -name \*.xlsx -o -name \*.ppt -o -name \*.pptx -o -name \*.txt | grep -v ".AppleDouble" | grep -v "~$" >> "$2"
}

cd /mnt/cifs

addToList . $currentPath/rsynclist.txt
#addToList ./Some\ Directory $currentPath/rsynclist.txt

rsync -av --files-from=rsynclist.txt /mnt/cifs /var/www/search/data

Read the Index

To read the index, the following script will pull the indexes out and write them to a file. This will include the name of the document and the key. You will need to take the step of revolving the path from the previous file list with this index as they are related by the source and destination directory passed to rsync.

#! /usr/bin/python

import httplib 
import json
import sys
import os
import codecs

import hostinfo

argc = len(sys.argv)
if argc != 2:
    print os.path.basename(sys.argv[0]), ""
    sys.exit(-1)

indexFileName = sys.argv[1]

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

conn = httplib.HTTPConnection(hostinfo.HOST)
data = json.loads(connRequest(conn, 'GET', hostinfo.INDEX + '/_search?search_type=scan&scroll=10m&size=10', '{"query":{"match_all" :{}}, "fields":["location"]}' ))

print data
total = data["hits"]["total"]

#scroll session id, used to request the next batch of data
scrollId = data["_scroll_id"]
counter = 0; 

data = json.loads(connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId))

#print data

f = codecs.open(indexFileName, "w", "utf8")

while len(data["hits"]["hits"]) > 0:
    for item in data["hits"]["hits"]: 
        f.write(item["fields"]["location"][0] + ',' + item["_id"] + '\n')
        f.flush()

    counter = counter + len(data["hits"]["hits"])
    print "Reading Index:", counter, "of", total

    scrollId = data["_scroll_id"]
    resp = connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId)
    #print resp
    data = json.loads(resp)

f.close()

Diff the File List and the Index List

Next we need to diff the two. We want to know the files we need to index and the files we want to delete. The following script does that (presuming that the lists have been modified to point at the same directory – i.e. /var/www/search/data). Out comes an “add” text file and a “delete” text file.

#! /usr/bin/python

import sys
import os

argc = len(sys.argv)
if argc != 5:
    print os.path.basename(sys.argv[0]), "   "
    sys.exit(-1)

def createMap(filename):
    ret = {}
    f = open(filename)
    lines = f.readlines()
    f.close()
    for line in lines:
        line = line.replace('\n','')
        split = line.split(',', 1)
        key = split[0]
        ret[key] = line
    return ret

fileMap = createMap(sys.argv[1])
indexMap = createMap(sys.argv[2])

# if the entry is in fileMap but not indexMap, it goes into the add file
# if the entry is in indexMap but not fileMap, it goes into the delete file
add = {}

for key in fileMap:
    if indexMap.has_key(key):
        del indexMap[key]
    else:
        add[key] = fileMap[key]

f = open(sys.argv[3], "w")
for key in add:
    f.write(add[key] + '\n');
f.close()

f = open(sys.argv[4], "w")
for key in indexMap:
    f.write(indexMap[key] + '\n');
f.close()

Add to the Index

Next we iterate through all the files in the “add” list.

#! /usr/bin/python

import httplib 
import binascii
import sys
import os
import socket

import hostinfo

argc = len(sys.argv)
if argc != 3:
    print os.path.basename(sys.argv[0]), " "
    sys.exit(-1)

rootFsDir = sys.argv[2] 

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

def connInitialize(conn):
    print connRequest(conn, 'DELETE', hostinfo.INDEX)
    print connRequest(conn, 'PUT', hostinfo.INDEX, '{  "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}}') 
    print connRequest(conn, 'GET', '/_cluster/health?wait_for_status=green&pretty=1&timeout=5s' )
    print connRequest(conn, 'PUT', hostinfo.INDEX + '/attachment/_mapping', '{  "attachment" : {   "properties" : {      "file" : {        "type" : "attachment",        "fields" : {          "title" : { "store" : "yes" },          "file" : { "term_vector":"with_positions_offsets", "store":"yes" }        }      }    }  }}' )

def connRefresh(conn):
    print connRequest(conn, 'POST', '/_refresh')

def connAddFile(conn, filename, rootFsDir):
    title = os.path.basename(filename)
    location = filename[len(rootFsDir):]

    with open(filename, 'rb') as f:
        data = f.read()

    if len(data) > hostinfo.LARGEST_BASE64_ATTACHMENT:
        print 'Not indexing because the file is too large', len(data)
    else:
        print 'Indexing file size', len(data)
        base64Data = binascii.b2a_base64(data)[:-1]
        attachment = '{ "file":"' + base64Data + '", "title" : "' + title + '", "location" : "' + location + '" }'
        print connRequest(conn, 'POST', hostinfo.INDEX + '/attachment/', attachment)

socket.setdefaulttimeout(30)
conn = httplib.HTTPConnection(hostinfo.HOST)
#connInitialize(conn)

f = open(sys.argv[1])
lines = f.readlines()
f.close()

idx = 0

rootFsDir = rootFsDir + '/'

for line in lines:
    line = line.replace('\n', '')
    idx = idx + 1
    filename = rootFsDir + line
    print idx, filename
    try:
        connAddFile(conn, filename, rootFsDir)
    except Exception, e:
        print str(e)
        conn = httplib.HTTPConnection(hostinfo.HOST)  

connRefresh(conn)

Delete the Files Not Needed

Finally, we delete the index and physical files no longer needed.

#! /usr/bin/python

import httplib 
import binascii
import sys
import os
import socket

import hostinfo

argc = len(sys.argv)
if argc != 3:
    print os.path.basename(sys.argv[0]), " "
    sys.exit(-1)

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

def connRefresh(conn):
    print connRequest(conn, 'POST', '/_refresh')

def connDeleteFile(conn, index):
    print connRequest(conn, 'DELETE', hostinfo.INDEX + '/attachment/' + index)

socket.setdefaulttimeout(30)
conn = httplib.HTTPConnection(hostinfo.HOST)

f = open(sys.argv[1])
lines = f.readlines()
f.close()

idx = 0

for line in lines:
    line = line.replace('\n', '')
    idx = idx + 1
    split = line.split(',')
    filename = split[0]
    index = split[1]
    print "Delete:", idx, filename, index
    try:
        connDeleteFile(conn, index)
    except Exception, e:
        print str(e)
        conn = httplib.HTTPConnection(hostinfo.HOST)  

    try:
    	os.remove(sys.argv[2] + '/' + filename)    
    except:
        pass

connRefresh(conn)

There it is. I have all these steps including resolving the path between the file list and the index list. One further thing to note is that the hostinfo file referenced by the python scripts look like this:

#! /usr/bin/python

HOST = '127.0.0.1:9200'
SITE = ''

INDEX = SITE + '/basic'

LARGEST_BASE64_ATTACHMENT = 50000000

HTML/Javascript, HTTP, jQuery, Python, Searching

A Search Engine for Office Documents

May 17, 2014 Maksym Shyte Leave a comment

Have you ever worked at a place where there was a mass of files and documents on a share and even old timers forget where important documents are?

Search by file name stinks and SharePoint has been another excuse to dump stuff that gets lost.

So I decided to figure out an easy way to get a content search engine up looking through the files on a share. I found a solution. It isn’t pristine for these reasons.

Browsers can’t link to files on a share for obvious security reasons.
For reason one, the decision was made to copy searchable documents onto the web server. This is time consuming to transfer and duplicates information but the documents are served successfully.
For reason two, it would be possible to add an server plugin that reads and delivers a file on a share. Just haven’t done that yet.

So we will start with what we have and consider changing it later.

The basis for this will be Ubuntu 12.04 LTS. Why? Because I have such a machine handy and it is 9 years old. This will be based on all the wonderful work of elasticsearch and Lucene.

So, here are the steps. Remember, this is a bit hacky.

Install apache2. (In the case of Ubuntu, it is “sudo apt-get install apache2”.)
Install openjdk-7-jre-headless. (“sudo apt-get isntall openjdk-7-jre-headless”).
Download elasticsearch (from elasticsearch.org – the .com site takes you to pay-for products). Because I am using Ubuntu, I thought I would use the apt repository.
Follow the steps to start elasticsearch – in my case listed on the web site. Be advised that elasticsearch binds to all interfaces tp a free port between 9200 and 9300. We will assume that the port is 9200 as it is in my case. However, it probably should only bind to a port on localhost or at least, the security should be evaluated to make sure it complies with what you need.
We will need two plugins. You can install them from you elasticsearch/bin location. In my case it was /usr/share/elasticsearch/bin/plugin.
```
bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/2.0.0
bin/plugin -install de.spinscale/elasticsearch-plugin-suggest/1.0.1-2.0.0
```
Restart elasticsearch. (“sudo service elasticsearch restart”). You will also need to verify the versions of these plugins.
For apache2, make sure to enable the proxy, proxy_http, and ssl modules. On Ubuntu, the “a2enmod” is an easy utility to do this.
In my Apache setup, I added a new file called “elasticsearch” inside /etc/apache2/conf.d. (Note the 13.10 doesn’t use a conf.d directory. It could be added to the bottom of apach2.conf although I am sure there is a more “pristine” location.) The contents are below.
```
<IfModule proxy_module>
<IfModule proxy_http_module>

<Proxy *>
<Limit GET > 
    allow from all 
</Limit>

<Limit POST PUT DELETE>
    order deny,allow 
    deny from all 
</Limit>
</Proxy>

ProxyPreserveHost On
ProxyRequests Off
LogLevel debug
ProxyPass /es http://localhost:9200/
ProxyPassReverse /es http://localhost:9200/

</IfModule>
</IfModule>
```
The application depends on the /es directory under web root. This can be changed along with the web pages that use it.
Restart apache2. (“sudo service apache2 restart”)
Download the HTML and Javascript for the search pages from here: Search HTML and Javascript. It uses jQuery and jQueryUI and AJAX to perform the searching and suggestions. Unzip and place in the web directory where you want it. For me, I wanted a search subdirectory so I placed my in /var/www/search.

So, the last thing is show how to index the files. I am a fan of python so this is python code making http requests to elasticsearch adding the information. The script below deletes the index, recreates, and starts adding content to it – from files in a directory.

#! /usr/bin/python

import httplib 
import binascii
import os

HOST = 'localhost:9200'
INDEX = '/basic'

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

def connAddFile(conn, filename, rootFsDir, httpPrefix):
    with open(filename, 'rb') as f:
        base64Data = binascii.b2a_base64(f.read())[:-1]

    title = os.path.basename(filename)
    location = httpPrefix + filename[len(rootFsDir):]

    attachment = '{ "file":"' + base64Data + '", "title" : "' + title + '", "location" : "' + location + '" }'
    print connRequest(conn, 'POST', INDEX + '/attachment/', attachment)

conn = httplib.HTTPConnection(HOST)

print connRequest(conn, 'DELETE', INDEX)

print connRequest(conn, 'PUT', INDEX, '{  "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}}') 

print connRequest(conn, 'GET', '/_cluster/health?wait_for_status=green&pretty=1&timeout=5s' )

print connRequest(conn, 'PUT', INDEX + '/attachment/_mapping', '{  "attachment" : {   "properties" : {      "file" : {        "type" : "attachment",        "fields" : {          "title" : { "store" : "yes" },          "file" : { "term_vector":"with_positions_offsets", "store":"yes" }        }      }    }  }}' )

# Add files here repeatedly
rootFsDir = '/var/www/search/data/'
searchDir = ''          # This is for recursion through the directories
httpPrefix = 'data/'
# Make this recursive some day
for file in os.listdir(rootFsDir + searchDir):
    connAddFile(conn, rootFsDir + searchDir + file, rootFsDir, httpPrefix)

print connRequest(conn, 'POST', '/_refresh')

If you decide to get more creative and add only new files and delete the old ones, we need to understand how to get the list of existing files that are indexed. Then you just have to correlate the current state of the files on disk with the index list. This script gets the indexes and the files associated with them.

#! /usr/bin/python

import httplib 
import json
import sys
import os

import hostinfo

argc = len(sys.argv)
if argc != 2:
    print os.path.basename(sys.argv[0]), ""
    sys.exit(-1)

indexFileName = sys.argv[1]

def connRequest(conn, verb, url, body = None):
    if body == None:
        conn.request(verb, url)
    else:
        conn.request(verb, url, body)
    return conn.getresponse().read()

conn = httplib.HTTPConnection(hostinfo.HOST)
data = json.loads(connRequest(conn, 'GET', hostinfo.INDEX + '/_search?search_type=scan&scroll=10m&size=10', '{"query":{"match_all" :{}}, "fields":["location"]}' ))

total = data["hits"]["total"]

#scroll session id, used to request the next batch of data
scrollId = data["_scroll_id"]
counter = 0; 

data = json.loads(connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId))

f = open(indexFileName, 'w')

while len(data["hits"]["hits"]) > 0:
    for item in data["hits"]["hits"]:
        f.write(item["fields"]["location"][0] + ',' + item["_id"] + '\n')
        f.flush()

    counter = counter + len(data["hits"]["hits"])
    print "Reading Index:", counter, "of", total

    scrollId = data["_scroll_id"]
    resp = connRequest(conn, 'GET', hostinfo.SITE + '/_search/scroll?scroll=10m', scrollId)
    #print resp
    data = json.loads(resp)

f.close()

To delete files, the python snippet looks like this where index is the id for the file we want indexing deleted for.
```
def connDeleteFile(conn, index):
    print connRequest(conn, 'DELETE', hostinfo.INDEX + '/attachment/' + index)
```

So there we have it. All we have to do figure out where we are getting our data from and copy it to the “data” directory. One particular way I have done this is with rsync across an SMB share.

This by no means is meant to be a lesson on elasticsearch. There can be some improvement here.

However, this is a quick way to set up searching documents for information you never knew existed. (Side note: I have had 10 ms search times across 2500 documents.)

Recycling a Third Party Application with System Tray Icon

May 13, 2014 Maksym Shyte Leave a comment

I had a need to recycle a third party application that had a system tray icon. The application controlled hardware and would get into a funky state.

The application was titled the “user mode driver” but I’m not totally sure if it was the user mode driver framework that Microsoft touted with Vista. The user mode driver (UMD) was really a bridge process between the Ethernet port and a COM (a.k.a. the older timer component object model) in-process DLL that resided in your program memory space.

The UMD also had a system tray component to it that needed a little cleanup when the application was killed. The sytem tray icon was left behind.

This post is recycling other’s work that we will reference. This post is about bringing it all together in C#.

There are three parts to this option.

Stop the process
Restart the process
Clean up the system tray.

For this example though, we will assume that we know the full path to the process and that the process name is the base file name without extension.

Stop the Process

C# has a handy way to stop processes.

private void StopUserModeDriver(string userModeDriverPath)
{
  Process[] procs = null;

  try
  {
    procs = Process.GetProcessesByName(Path.GetFileNameWithoutExtension(userModeDriverPath));

    foreach (Process proc in procs)
    {
      proc.Kill();
      proc.WaitForExit(5000);
    }
  }
  finally
  {
    if (procs != null)
      foreach (Process proc in procs)
        proc.Dispose();
  }
}

Restart the Process

This one is simple.

private void StartUserModeDriver(string userModeDriverPath)
{
  Process.Start(userModeDriverPath);
}

Clean Up the System Tray

This code is present here and we will show it again on this post.

[StructLayout(LayoutKind.Sequential)]
public struct RECT
{
  public int left;
  public int top;
  public int right;
  public int bottom;
}
[DllImport("user32.dll")]
public static extern IntPtr FindWindow(string lpClassName, string lpWindowName);
[DllImport("user32.dll")]
public static extern IntPtr FindWindowEx(IntPtr hwndParent, IntPtr hwndChildAfter, string lpszClass, string lpszWindow);
[DllImport("user32.dll")]
public static extern bool GetClientRect(IntPtr hWnd, out RECT lpRect);
[DllImport("user32.dll")]
public static extern IntPtr SendMessage(IntPtr hWnd, uint msg, int wParam, int lParam);

private void RemoveOrphanedIconsFromSystemTray()
{
  IntPtr systemTrayContainerHandle = FindWindow("Shell_TrayWnd", null);
  IntPtr systemTrayHandle = FindWindowEx(systemTrayContainerHandle, IntPtr.Zero, "TrayNotifyWnd", null);
  IntPtr sysPagerHandle = FindWindowEx(systemTrayHandle, IntPtr.Zero, "SysPager", null);
  IntPtr notificationAreaHandle = FindWindowEx(sysPagerHandle, IntPtr.Zero, "ToolbarWindow32", "Notification Area");
  if (notificationAreaHandle == IntPtr.Zero)
  {
    notificationAreaHandle = FindWindowEx(sysPagerHandle, IntPtr.Zero, "ToolbarWindow32", "User Promoted Notification Area");
    IntPtr notifyIconOverflowWindowHandle = FindWindow("NotifyIconOverflowWindow", null);
    IntPtr overflowNotificationAreaHandle = FindWindowEx(notifyIconOverflowWindowHandle, IntPtr.Zero, "ToolbarWindow32", "Overflow Notification Area");
    RefreshSystemTrayArea(overflowNotificationAreaHandle);
  }
  RefreshSystemTrayArea(notificationAreaHandle);
}

private static void RefreshSystemTrayArea(IntPtr windowHandle)
{
  const uint wmMousemove = 0x0200;
  RECT rect;
  GetClientRect(windowHandle, out rect);
  for (var x = 0; x < rect.right; x += 5)
    for (var y = 0; y < rect.bottom; y += 5)
      SendMessage(windowHandle, wmMousemove, 0, (y << 16) + x);
}

Essentially, we are getting window handles to the notification area that is on your system tray and also the overflow area introduced in Windows 7 (don’t know about Vista – does anyone remember Vista?). That is the little arrow icon in the system tray that opens a little popup where all pestering but insignificant applications’ system tray icons live.

Do you remember how you have an orphaned system tray icon so you move your mouse over it to find magically disappears? That is exactly what this code does. With the Window handles to the system tray and overflow, we simply move our mouse repeatedly up and down and left to right. We don’t actually move the cursor, just send the windows message.

There was another solution presented somewhere (on code project but I can’t find it now) that got information in the private bytes of the window allocations to determine if a process was still operating. This approach was more pristine but did some memory allocation tricks in C# that made me nervous. Sending mouse messages was certainly safer although not elegant.

C/C++

C++ Speed Test with FPU and ints

April 30, 2014 Maksym Shyte Leave a comment

I wanted to test the difference on modern hardware between floating point match and integer math. Here is my code (which was similar to C# code previously written).

// CSpeedTest.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include 
#include 
#include 

using namespace std;

#define FP_MULT
//#define FP_NEG
//#define INT_MULT
//#define INT_NEG

class StopWatch
{
    LARGE_INTEGER m_freq;
    LARGE_INTEGER m_startTime;
    LONGLONG m_totalTime;
public: 
    StopWatch() : m_totalTime(0L)
    {
        QueryPerformanceFrequency(&m_freq);
    }

    void Start()
    {
        QueryPerformanceCounter(&m_startTime);
    }

    void Stop()
    {
        LARGE_INTEGER stopTime;
        QueryPerformanceCounter(&stopTime);
        m_totalTime += (stopTime.QuadPart - m_startTime.QuadPart);
    }

    void Reset()
    {
        m_totalTime = 0L;
    }

    double ElapsedTime()
    {
        return (double)(m_totalTime) / (double)(m_freq.QuadPart);
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    #if defined(FP_MULT) || defined(FP_NEG)
    volatile double poo = 0.0;
    #endif
    #if defined(INT_MULT) || defined(INT_NEG)
    volatile int poo = 0;
    #endif

    StopWatch stopWatch;
    for (int idx = 0; idx < 1000000000; idx++)
    {
      stopWatch.Start();
      #if defined(FP_MULT)
        poo = -1.0 * poo;
      #endif
      #if defined(FP_NEG) || defined(INT_NEG)
        poo = -poo;
      #endif
      #if defined(INT_MULT)
        poo = -1 * poo;
      #endif
      stopWatch.Stop();
    }

    double elapsedTime = stopWatch.ElapsedTime();

    int minutes = elapsedTime / 60;
    int seconds = (int) (elapsedTime) % 60;
    int ms10 = (elapsedTime - int(elapsedTime)) * 100;

    cout << setfill('0') << setw(2) << minutes << ':' << seconds << ':' << ms10 << endl;;

    return 0;
}

The code was compiled as a console application for Win32 Debug so the variables would get “registered”.

The test machine is a Dell Precision M4800. The process is an Intel Core i7-4800MQ CPU at 2.70 GHz with 16GB ram. The OS is Windows 7 Professional 64 bit with SP1.

Here is the results. I have also included the assembler for the operation under test.

define	time	assembly
FP_MULT	7.32s	fld qword ptr [__real@bff0000000000000 (0BE7938h)]; fmul qword ptr [poo]; fstp qword ptr [poo]
FP_NEG	7.56s	fld qword ptr [poo]; fchs; fstp qword ptr [poo]
INT_MULT	7.58s	mov eax,dword ptr [poo]; imul eax,eax,0FFFFFFFFh; mov dword ptr [poo],eax
INT_NEG	7.59s	mov eax,dword ptr [poo]; neg eax; mov dword ptr [poo],eax

I actually don’t believe I have accomplished too much as the setup to call the timing functions actually take many, many more opcodes. However, this was an interesting experiment and I do now have a cool C++ stopwatch on Windows for more extensive testing on much larger blocks of test code.

Uncategorized

Intelligent Agent #2
(ninja turtles)

April 30, 2014 Scat Leave a comment

As promised, here is a super simple example of an Intelligent Agent program using NetLogo.

The goal is to find the boundaries within an image. There are powerful algorithms for locating boundaries, but I wanted to do it another way, I wanted to see if Intelligent Agents could be used. As I said in my earlier post, NetLogo provides a rapid prototyping means for creating Agents and offers a visual development and execution environment. Below is the image I started with. Note the low resolution. NetLogo could handle a much higher resolution image, but the image would be too large to fit in this post, so I had to lower the resolution so I could display it here. (now someone who is an expert in NetLogo might point me to a setting to make the ‘patches’ smaller, that would help).

Screen shot before Agent run.

Screen shot after Agents have run a bit.

As you can see, the Agents did a fair job of locating the color contours, and thus the edges.
My approach is bone headed simple. Agents wander around randomly keeping track of the color of recently visited patches. If the new patch is enough different from the prior patch, then it marks the patch White to indicate a boundary.
The approach has a serious flaw, and a weakness. The flaw is that the boundary is dependent on the direction of the Agent at the time it encounters a boundary. For example, an Agent moving from Blue to Red will mark the Red patch, while a neighboring Agent moving from Red to Blue will mark the Blue patch. This causes the edge to be much more ragged than it really is. This could be solved by adding a rule such as always mark the lighter colored patch.
The weakness is the algorithm could be much more efficient. Rather than wander aimlessly, the Agent could try to determine the direction of the edge and follow it. If two Agents collide, one could jump to a new location to let the other complete the boundary.

I am sure you are eager to see some code. OK, here goes:
extensions [bitmap ]


turtles-own [last-color second-last-color hunt-color]
to setup

  let img bitmap:import "C:\\Temp\\Desert.jpg"

  ;;set img bitmap:to-grayscale img

  bitmap:copy-to-pcolors img false
  create-turtles 10 [fd 10]

  ask turtles [

   set last-color pcolor

   set second-last-color pcolor

   set hunt-color pcolor

  ]

end
to hunt

  ask turtles[

    rt random 50

    lt random 50

    fd 1

    if pcolor = 0

    [jump-elsewhere]

    set second-last-color last-color

    set last-color pcolor

    if isContour second-last-color  last-color

    [set pcolor [255 255 255]]

  ]

  hunt
end
to-report isContour [color1 color2]

  let retVal false

  foreach [0 1 2]

  [

   ;; show item ?1 color1

   if abs (item ?1 color1 - item ?1 color2) > 60 ;; or item ?1 color2 - item ?1 color1 > 60

   [set retVal true]

  ]

  if approximate-rgb item 0 color1 item 1 color1 item 2 color1 = white or approximate-rgb item 0 color2 item 1 color2 item 2 color2 = white

  [set retVal false]

  report retVal

end

to jump-elsewhere set xcor random 40 set ycor random 40 set last-color pcolor set second-last-color pcolor set hunt-color pcolor end

Uncategorized

Intelligent Agents
(ninja Turtles)

April 30, 2014 Scat Leave a comment

Most of us in IT are interested in Artificial Intelligence and leveraging AI techniques to solve current problems.
Intelligent Agents is one branch of AI and although not as sexy as Neural Nets and other branches, it is perhaps the most widely used.
Agents are everywhere, from viruses, to web crawlers, there are many bits of code with a degree of autonomy.
Definitions of Agents can be found on Wikipedia.
In a nutshell, an Intelligent Agent receives data from its environment and makes decisions based on that data.
Agents work for someone, so there is usually a communication and reporting aspect.
If this sounds like the proxy pattern, than I have done a poor job describing it, please see the Wikipedia article which goes into much more detail.

A classic simplified example of Intelligent Agents is called Sheep and Wolves. In the most simple form, you have Sheep agents, Wolf agents and an environment made up of grass. Sheep eat grass, Wolves eat Sheep. Sheep wander around aimlessly as do Wolves. When a Wolf bumps into a Sheep, it eats it. If all the Sheep are eaten, the Wolves all die. If the Sheep over populate, they die of starvation.

Even this simple model can be expanded to make the problem more ‘real’ and interesting.
For example – the sheep could learn when wolves hunt. The wolves could learn when sheep graze. There could be different species of grass that grow at different rates and have differing nutritional value. Food is not the only resource that animals need, so you could introduce water and shelter. Sheep don’t clone themselves, they must find a mate etc. Sheep have a notoriously low learning rate, so if one sheep discovers a new resource, the others are slow to learn about the new resource, but some do eventually learn.

You can see how fun this could get even for a contrived example.

Northwestern University has provided NetLogo, a great, easy to use and visual way to develop models such as described above.
Go here: NetLogo for more info and to get started.
MIT has a similar system: StarLogo

Before you go all Donatello on me, download NetLogo and explore some of the 100 or so included models.

Soon I will post a simple model which can locate edges in an image. I call it color contouring.

Screen shot of Sheep and Wolves model running in NetLogo.

C# Speed Tests with FPU and ints

April 23, 2014 Maksym Shyte Leave a comment

So I saw this sort of code inside a loop (doing inversion of data) in C# the other day; assume x is a double:

dblValue = dblValue * -1.0d;

I wondered what the speed comparison was compared to this:

dblValue = -dblValue;

I expected the second to be faster. I decided to find out. After testing floating point numbers, I also decided to try the same thing with integers just to see. Here is my code set.

#define FP_MULT
//#define FP_NEG
//#define INT_MULT
//#define INT_NEG

using System;
using System.Diagnostics;

class Script
{
  [STAThread]
  static public void Main(string[] args)
  {
    #if FP_MULT || FP_NEG
    double poo = 0d;
    #endif
    #if INT_MULT || INT_NEG
    int poo = 0;
    #endif

    Stopwatch stopWatch = new Stopwatch();
    for (int idx = 0; idx < 1000000000; idx++)
    {
      stopWatch.Start();
      #if FP_MULT
        poo = -1.0d * poo;
      #endif
      #if FP_NEG || INT_NEG
        poo = -poo;
      #endif
      #if INT_MULT
        poo = -1 * poo;
      #endif
      stopWatch.Stop();
    }

    TimeSpan ts = stopWatch.Elapsed;

    string elapsedTime = String.Format("{0:00}:{1:00}.{2:00}",                                 ts.Minutes, ts.Seconds, ts.Milliseconds / 10);
    Console.WriteLine(elapsedTime);
  }
}

The program was compiled using .NET 4.0 64 bit framework. The exe was compiled to ILOnly verified using CorFlags.exe. This means the exe was running in 64 bit mode.

The test machine is a Dell Precision M4800. The process is an Intel Core i7-4800MQ CPU at 2.70 GHz with 16GB ram. The OS is Windows 7 Professional 64 bit with SP1.

Here is the results. I didn’t really average anything but everytime I ran every one of these tests, the values were always similar. I have also included the IL for the operation under test. (I used ILSpy.)

define	time	IL
FP_MULT	15.49s	ldc.r8 -1; ldloc.0; mul; stloc.0
FP_NEG	15.35s	ldloc.0; neg; stloc.0
INT_MULT	15.35s	ldc.i4.m1; ldloc.0; mul; stloc.0
INT_NEG	15.43s	ldloc.0; neg; stloc.0

It has been awhile since I have evaluated floating point and integer math but am impressed that the timing is very similar.

I think I may try this on the same machine using a simple C++ program and performance counters to see the results and dive deeper into this.

Follow up note: I now don’t believe I accurate measured anything as the stopwatch opcodes were likely more plentiful then the code under test. However, it was an interesting experiment and we learned about the stopwatch in .NET.

Geek Droppings

Stupid Simple File Server from Spare Parts

Swiss Army Knife of SMTP Servers

Simple Python JSON server based on jsonrpclib

Pulling Documents for Searching

Initialize the Index

Mount a SMB share

Rsync Between Server and SMB Share

Read the Index

Diff the File List and the Index List

Add to the Index

Delete the Files Not Needed

A Search Engine for Office Documents

Recycling a Third Party Application with System Tray Icon

Stop the Process

Restart the Process

Clean Up the System Tray

C++ Speed Test with FPU and ints

Intelligent Agent #2
(ninja turtles)

Intelligent Agents
(ninja Turtles)

C# Speed Tests with FPU and ints

The poop on being a geek