The Storage Management Blog

Getting the best out of your unstructured data

Where did that file go?

Searching with Windows XPIt used to be easy.  Back in the days of Windows XP you didn’t have many files.  And you kept them all on one, quite small, hard drive.  When you wanted to find a file you just clicked search.  After a short while (and if you remembered the name right) it would be listed for you to use.

Don’t ask me what that dog was all about!

The rest, as they say, is history.  Now we have a Windows 7 search tool that depends on a heavy-weight indexing system.  Test your knowledge and prove that it really is easy to use (not my experience) … tell me how you start an advanced search?

Complexity isn’t the only issue – Microsoft have increased indexing sophistication because we now create so many more files.  No one worries about having enough disk space – they create backups and copies of copies and never clean up.

And if you use a network file server, who know where that file went that you were working on last month.  Perhaps its in that email I sent?

The future is worse — apparently we’ll all be using cloud file storage more and more in the future.  Cloud services are typically priced on data transfer and file access frequency, rather than just storage space consumed – so consider the challenge (translates as “cost”) of searching your cloud storage – ignoring any performance issues!

Microsoft wimped out of building a “proper” database-based file system into Windows… one day perhaps.  In the meantime you can use software like SPACEWatch Storage Suite to give you instant network-wide file search.

The screen shot below shows the main SPACEWatch File Finder window.  You can see the results of a search on the right, and a “visualisation” pane on the left.  The visualisation pane shows all the systems (servers etc) and users for whom results have been found.  Results can be filtered by clicking on one of these, or entering text into the filter box on the top right.

File Finder - search across the network

Note there are other panes available as tabs on the left and right borders of the windows.  These give access to the custom search options, for example.  Custom options include obvious things like file name, size, type, owner etc. but also more unusual options like whether or not the file has ever been used.

SPACEWatch lets you save even the most complex searches and re-use them with one click, or generate reports from them.  Other SPACEWatch users can re-use these searches as well.  Interestingly, how you sort the results is also saved – so, for example, you can set a limit on how many results to show and sort the results by size – and get a dynamic “top x” list.

SPACEWatch does use a proper database to store the file data it collects – so you can be assured that even the most complex searches, on the largest networks, produce results in seconds.  Don’t believe me?  Try downloading one of the free trial versions and see for yourself.

The curse of PST files

Microsoft have never recommended that users store their Outlook data files on the network – performance can be bad and there’s always a danger of corruption.  However in return Exchange mailbox databases have traditionally been quite limited in size, so admins tend to set quite low limits on their users (interestingly, not an issue we see in Lotus Domino shops – where mail databases tend to grow and grow)… so users are forced to carry out local archiving to PST files.

This can sometimes be addressed with Exchange archiving solutions – but keeping local PST archives still gives users the easiest way to search and find what they want in old mailboxes.

So what to do?

Finding PST filesA good start is to find out the scale of the problem.  Take a look at the screen shot on the left.  This is from SPACEWatch‘s File Types analysis tool.

This shows how PST files have been found right across the network – I can expand the tree to see how many files, and how much storage they’re consuming.

If I want to investigate further I right click and choose one of the context menu options such as listing the largest ones in a particular area of the network – or the least used.

But what about cleaning up all this potentially wasted space?  There are a number of approaches you can take, all of which SPACEWatch will help with, and many of which can be automated to run to a scheduled routine:

  • Send users a list of the PSTs they own and ask them to check if they’re still needed – and remove them if they’re not.
  • Archive PST files that haven’t been used for a long time – typically leaving a “stub” that looks like the original file to any application, but redirects the application to the file’s new location (which could be cheap secondary storage).
  • Delete unused PST files.
  • Leave the PST files where they are, but archive file attachments within them (this is typically what takes up most of the file space): SPACEWatch with it’s Exchange add-on lets you do this on PST files, Exchange mailboxes and public folders.

Top twenty file types

Perhaps if you follow some of these approaches, you won’t end up with a top twenty file types chart like this!  By the way, someone’s already been cleaning up on this network – you can tell, because “mp3″ only just scrapes in to the top twenty.  Try this with SPACEWatch on your network and I’ll bet its a lot higher!

Follow

Get every new post delivered to your Inbox.