Floppy Disk Imaging

From Tech Tangents
Jump to navigation Jump to search

I want you to forget everything you know about about how floppy disks hold data. That's what I've had to do over the last few months as various projects I've been working on have challenged my understanding of them. I've completely changed my own workflows for working with floppy disk images as a result. This change in perspective has also made it much easier to adapt to new and completely unknown systems. Today I want to talk about how floppy disks store data at a lower level, how you can make more robust disk images, and how doing all that can give you some powerful abilities that can make things much easier and some wild things possible. You can use this knowledge for creating better copies of disks for preservation or for using disk images made by others with your own machines. So no matter what you are doing with them, this should help you better understand floppies.

Disk Formats are Variable

Two 3.5 inch floppy disks, one with "1.44 MB" printed on it and the other with "2.0 MB". Both are the same kind of disk for the same kind of drive.

Why do you need a deeper understanding of imaging floppies, we've been doing this for decades, how hard can it be? Well, let's start with something more familiar, the 1.44MB 3.5in floppy disk for PCs. Although, that's not quite true, some of my disks say they hold 2MB instead, and this disk from Microsoft actually holds 1.6MB of data. This is a difference in formatting of the disks which defines how much data they can hold in practice. The 2MB quoted on the one disk is the raw unformatted capacity that you can't truly achieve due to many factors. The common 1.44MB format is the sum of 80 tracks over 2 sides with 18 blocks that hold 512 bytes each.

Layout of a CHS addressed disk.

This is known as CHS addressing, standing for Cylinders, Heads, and Sectors. The radial surface of the disk is broken up into individual segments that each hold a portion of the data. On floppy disks we usually call cylinders tracks but this is how all floppy disks work at a fundamental level. The number of tracks, sides, sectors, and bytes per sector are the format. Only the number of tracks and sides are really linked directly to the hardware, sectors and how many bytes they hold are determined by the machine controlling the floppy drive. Different machines can define different formats that are incompatible.

Different Drive Types

Shugart's original floppy drive interface. Supported 0-4 or 0-16 drives depending on addressing method. IBM later modified this interface to control drive motors independently by giving up the ability to address more than two drives.

For the vast majority of disks no matter the format you can get away with three drive types, a 40 track double sided 5.25in drive, 80 track double sided 5.25in drive, and an 80 track double sided 3.5in drive. Most disks are physically compatible with those because it made manufacturing cheaper. 8in disks also work the same way as the smaller drives if you are working with those. You can usually get away with only 80 track 5.25in and 3.5in drives most of the time for reading, but you shouldn't use an 80 track 5.25in for writing 40 track disks because the head is narrower. The main advantage to reading with just those drive types is the ability to connect both with one cable. I have a whole video on this topic if you want to know more about it, but most floppy drives use IBM's modified version of the Shugart interface that is limited to two drives. Unless you use something like the floppy switch adapter I designed, it's a pain to change drives out. You also can't read non-shugart interface drives like Zip disks with Shugart style interfaces, those are completely different. Now In the past it was easiest to connect your two drives to an older PC with a native floppy interface, but this isn't necessary anymore thanks to several different projects.

Reading Disks From Modern Computers

The two major devices I will be covering here are the KryoFlux and the Greaseweazle. There are many others I won't be covering like the Applesauce, SuperCard Pro, and Fluxengine. These all work based on the same premise but I'll tell you before we start I recommend the Greaseweazle above all other options because it is fully featured at a very low cost. For the moment I'm going to show using the Kryoflux because it has a GUI to control the device. The fluxengine's software has added support for the greaseweazle but we'll look at another GUI option in a moment. These devices allow you to connect floppy drives to a modern system over USB. They don't make it "easy" to use the drives, so don't expect to put in any disk and simply copy files off of it. The different potential formats make that mostly infeasible, though fluxengine is putting some huge effort into trying. Instead they all provide a low level hardware interface to the drives and the software that controls them is designed primarily for doing full disk reads and a writes. The only thing you need to know beforehand is how many tracks you are working with and once you have that set you can start copying a disk. Most of the programs will have an option for you to specify the type of disk you are reading before you start, but I would advice against using specific formats like that initially and instead prefer to use the raw stream format created by Kryoflux. These devices are not tied to supporting any one specific format of disk, they can arbitrarily read any data on them as flux transitions. This is as close as to raw analog data as you are going to get. The Kryoflux stream data filetype can even store multiple reads of each track to give you a consensus on data validity. The flux data can later be converted into anything else you need as well. You can also directly write flux data to a disk to make a direct copy without knowing how the disk was formatted. But that is an imperfect solution because you combine the minor calibration differences of each drive used to read and write the disks writing an ever more slightly out of spec format. I want to stress Flux data is the backbone of a better workflow for preserving floppy disks and is the first step in how I read every disk I back up now. What you do with the flux data after is where my view of floppies got turned upside down. But don't worry because it's really not that hard thanks to one more amazing tool.

Floppy disk flux stream
PC 1.44MB DSHD Floppy Disk Flux Map

Seeing Floppy Disk Data

Once you have stream files the best way to work with and understand them is the HxC Floppy Emulator software. If you've ever seen a green data map of a floppy disk, this is how that was made. But this lets you see a lot more than a cool layout of the data on the disk, this can be a very helpful step in understanding the format of the disk. Here we can directly see the layout of the 18 sectors on both sides of the disk. You can also hover over each individual sector to see details about it like the sector ID, side, and track. These are stored in a header portion of the sector and are actually part of the metadata on the disk. A note about this disk layout view, there are different visualization options for it for representing different disk types, but you can always use "dummy disk" instead which works with all disks unlike the specific ones. There is more you can do with this software like switching into track view, zooming in, and seeing it decode the data, but there are a few more things we should go over now that you have a familiarity with the tools we will be using.

Disk Encoding

HxC Sector information dialog

So far we have been working with flux data, but that isn't compatible with most software outside of what is specifically designed for preservation. So you will need to decode the flux data into binary data. This requires knowing not only the format of the disk, but also the encoding. The encoding is how the flux transitions on the disk translate into the binary 0s and 1s and also the timing of how the disk is read. You don't need to understand how the encoding works at a low level unless you are writing a flux converter yourself, but you do need to understand that there are three main encoding methods used by the majority of disks. FM, GCR, and MFM. Each of these are incompatible with each other due to how they encode the data but all existed for various reasons. You can tell which of these you are working with in HxC by looking at the sector information, but you can also make some rough guesses. 70s disks are more likely to be FM and were commonly called "Single Density". The Commodore 64, Apple II, and early Macs used GCR to fit a little more data than FM. But MFM commonly called "Double Density" and "High Density" came out shortly after that and all later disks used it instead. These aren't guarantees though, this Macintosh program (Specter published by Memorex) was actually shipped on an a disk with FAT12 MFM formatting like a PC. And some systems also used "density" to refer to different "tracks per inch" on drives. So a good step in your decoding process is to open your flux data in HxC to see how it is encoded before attempting to decode it. In the lower right there are different toggle options for choosing what encoding method to view the flux data with. That's enough info to cover most disks you'll encounter but there are some edge cases we'll get back to after covering a few more things.

Format Structure and Converting

Custom 1.44MB Floppy Diskdef
disk hd35
   cyls = 80
   heads = 2
   tracks * ibm.mfm
       secs = 18
       bps = 512
end 

Once you know the correct encoding for a disk you can start to work on converting the flux data into binary data. For common systems and disk formats this will be easy. HxC even has the ability to browse the files stored in some common disk formats. But when you start getting off the beaten path, it can be very helpful to understand how to define your own formats. This is where I'm going to switch over the the Greaseweazle. You can do this with the Kryoflux, but the workflow for the gw software on the command line makes this easier for a few reasons. Before we continue, reading disks with a greaseweazle is just a simple command with a few options.

Greaseweazle Command Examples
Mode Command Description
Reading gw read --drive A folder/track00.0.raw --tracks"c=40" Reads a 40 track disk connected after floppy cable twist to a folder of Kryoflux raw files
Converting gw convert --format ibm.1440 folder/track00.0.raw binary.img Converts flux data using the 1.44MB standard PC format to a binary image
Reading gw write --drive B --format ibm.360 binary.img Writes a disk to the drive before the twist on a floppy cable from binary data using the 360KB standard PC format

We're going to be using the convert mode for the rest of this. Formats can be loaded from text files that you can easily share and are known as disk definitions. gw ships with a large amount of diskdefs you can look at as examples. As a demonstration I want to start by making a custom diskdef for the standard 1.44MB disk by looking at the flux data in HxC. The first two easy things are the tracks and sides, we can see the disk goes from track 0 to track 79, so we can set cylinders to 80, there are also obviously two sides of the data. Next we are going to tell it to decode all tracks as MFM, all PC disks are MFM so this would be a safe guess but we can also verify that in HxC. Next is another easy one, we can count how many sectors there are to set that. That last significant thing is the bytes per sector, that is listed as the size in HxC, it's always a power of two (with some exceptions for very exotic formats) so it can be pretty easy to guess as well. And that is all we need to convert this format. If we use this as the format and convert the flux to binary data we will get a perfectly standard IMG file! I can even mount this file in a virtual machine to open it and see the contents! I'll mention if you used an 80 track drive to read a 40 track disk you will need to specify "steps=2" as a track option to tell gw to skip every other track. This is the complete basic process of preserving floppy disks using much better methods than just an old DOS PC. I've now shown you all of the major steps and tools that can be used for working with any disk format. But this example was easy, there are some true nightmares lurking that we have yet to cover. But lets not get to far ahead of ourselves and take a look at that 1.6MB floppy I showed at the start.

Understanding Non-Standard Disks

Custom DMF 1.68MB Floppy Diskdef
disk dmf35
   cyls = 80
   heads = 2
   tracks * ibm.mfm
        secs = 21
        bps = 512
        interleave = 2
end 

This is a "DMF" formatted disk, a nonstandard larger capacity format by Microsoft to ship fewer disks for their larger products. It's mostly the same as the 1.4MB disk, but instead of 18 sectors it has 21 fit into the same space, a difference of about 16%. That may not sound like much, but if you go watch my over hour long video on installing Office 97 from 46 disks and imagine needing 8 more disks on top of that without DMF, I think you can see the benefits. There is more to this though than just having three more sectors. If we take a look at the flux we can see that the sector IDs aren't all in order. These disks use a concept called interleaving. This spaces out the sectors with sequential data giving the CPU time to work on what it has already received. If the system hasn't finished processing the data from one sector by the time it reaches the next it will ignore it and then wait for an entire disk rotation before attempting to read it again. Interleaving adds a built in delay, which for something like a compressed software install such as this, probably helps quite a bit to speed up the total transfer if the system is slower but doesn't delay it much for faster machines. When it comes to imaging though, this presents a bit of a problem, if we just changed the 18 to a 21 in out diskdef the sectors would be out of order in our binary image. So we have to specify the interleave in the diskdef so the software knows what order the data should be in.

This is a disk from the HP Museum that was only partially re-formatted and cannot be converted to binary image with simple means.

I should note at this point that even with more complicated disks like the DMF example with interleave it isn't completely necessary to make a disk format to convert flux data. Binary data like IMGs do need it because they don't contain any information on the structure of the format. But because the flux data still has the sector headers in it some programs like HxC can export a IMG file without any format specified by just putting the sectors in the correct order based on the sides and IDs. Only under ideal circumstances though. As an example of when it won't work, lets say you have a disk that has been *reformated* for a different system on only one side. HxC cannot know this and will erroneously include the unrelated data in the image. This is why, while not technically required, it is a very good idea to always understand the format of a disk and specifically convert it as that. gw won't even attempt a flux conversion without a format and I don't really blame them for that. Because of this, I've also taken to using a degauss tool before writing to disks on new systems to make sure I start with a known clean slate.

More Examples of Determining Disk Formats

I want to give you several other examples of understanding unusual disk formats. There are some things out there that are considered grey beard level knowledge and others that seem impossible that you can do with a low level understanding of these concepts. So lets continue with another example using interleave that is my HP Series 200 setup. I have three different kinds of floppy drives for this computer. Each of them uses a different format and have different interleaves. The HP software even lets you custom define interleaves because some of the disks are interchangeable with other drives that might be even slower. However, some of these drives hold the exact same amount of data and they all use the same filesystem type known as LIF. Using the knowledge of what we learned so far, it's actually possible to image one disk as flux, convert it to binary data, and then rewrite it to a completely different kind of disk with a format to use in a different drive! Looking at the flux maps of the two disks we can determine some basic diskdefs (HP was actually cool enough to put this info in the manuals too ), the biggest difference is the 3.5in disk only using one side. But a closer look will also reveal our fist unusual sector IDs that begin at 0 instead of 1. 1 is the standard, but some devices used 0 instead. You can specify that in the track area to handle it correctly. With known formats though, you can freely convert between disk types to use in the physical drives, I will say this is a slightly unique situation for the HP hardware because it uses an abstracted interface to the drives.

HP LIF 5.25in Double Density/Double Sided HP LIF 3.5in Double Density/Single Sided
disk hp.lif.35dd
   cyls = 35
   heads = 2
   tracks * ibm.mfm
       id = 0
       interleave = 1
       secs = 16
       bps = 256
   end
end
disk hp.lif.70ss
cyls = 70
    heads = 1
    tracks * ibm.mfm
        interleave = 2
        secs = 16
        bps = 256
    end
end
Kaypro IV'83 Sequential Side Diskdef
disk kaypro.iv
    cyls = 40
    heads = 2
    tracks 0-39.0 ibm.mfm
        id=0
        secs = 10
        bps = 512
        interleave = 5
    end
    tracks 0-39.1 ibm.mfm
        id=10
        h = 0
        secs = 10
        bps = 512
        interleave = 5
    end
end

Next up, in a recent video I worked on getting a Kaypro IV, that is roman numeral IV '83, working which was an evolution of the earlier Kaypro II. These are CP/M computers which there are great tools for working with binary disk images for that allow you to extract and put files onto disks for the machines much easier. But you need to know what the formats are and how to convert them. Due to how CP/M works, the cpmtools also needs to know the disk format to skip past the BIOS and other system software that is stored in the boot tracks of the disk before the filesystem, so you may need to make two diskdefs for different tools. Back to the Kaypro IV though, it represents an example of how coming at disk imaging from flux and inspection is needed. The disks for this machine are not standards complaint, the second side of the disk uses the ID for the first side and the sector numbers are sequential from the first side as well. It is still possible to create a diskdef for this very unusual format. You can split your track decoding between sides, then make the second side start at sector 10 like we had set it to 0 for the HP disks. Then we have to set h=0 which tells software we are looking for side 1 headers, without this the software won't think the data isn't correct and ignore everything in the flux of the second side. Then with everything else set correct you can convert the flux data using the new format and successfully validate the binary data using cpmtools to extract files.

TRS-80 Model 1 SD+DD Boot Diskdef
disk trs80.sd-dd
    cyls = 40
    heads = 2
    tracks 0 ibm.fm
        id = 0
        secs = 10
        interleave = 2
        bps = 256
    end

    tracks 1-39 ibm.mfm
        id = 0
        secs = 18
        bps = 256
        interleave = 4
    end
end

Here's another example of a weird disk due to backwards compatibility, a TRS-80 Model 1 boot disk. This computer shipped with a floppy controller only capable of using FM Single Density encoding. But it was fairly common and easy to upgrade this to MFM Double density using something like this Percom Disk Doubler. However, the machine could still only initiate booting from a FM disk. This was worked around though by having only the first track of the disk be FM, and then having the rest of the tracks be MFM to fit more data (DOS-PLUS as an example). This is not the only system to do this either, here is an 8in CP/M-86 boot disk with the same mixed density track encoding in a different way. You can work with disk formats like this by specifying different track regions of the disk to be interpreted differently, like we did with the sides for the Kaypro. For the TRS-80 we can create a diskdef with track 0 as FM with 10 sectors of data and an interleave of 2. Then we specify the tracks of 1-39 as MFM with 18 sectors and an interleave of 4. And both sections have 256 bps, they are just sized differently because of the encoding. And with that, we can convert flux data into a binary image yet again for another strange disk type!

Completely Custom Disk Types

GCR and Zone Bit Recording visible on a disk for Commodore 64

Now lets take this different track types concept in another direction by looking at some Mac and Commodore 64 disks. These use GCR encoding because they came out before MFM was common and were trying to fit more data than FM. However, the real unique thing about them is that they both use Zone Bit Recording which is not related to GCR. The bit rate of the drive relative to the disk material changes based on the distance from the center of the disk. Each system did this differently, Apple changed the speed of the motor spinning the disk and Commodore changed the clock rate the head syncs to. This allowed more efficient usage of space on a disk compared to fixed speed recording, if the inner part of the disk is capable of holding the same amount of data as a CHS disk in half the space, why not double it on the outer tracks? The actual reason to not use ZBR is the complexity it adds to the drives massively increasing cost relative to "dumb" CHS fixed bitrate drives and both companies switched to or at least supported MFM later. Now when it comes to decoding the format of these disks it isn't all that different from the FM and MFM mixed density disks we just looked at. The main difference is that you have to change the clock rate as you move from the outside in. However, there isn't as much of a need to make a custom definition here, the Macs and Commodore systems that used these formats are so popular they have already been added to the main Greaseweazle diskdef file as well as most other projects and you should use those. In the unlikely event you encounter Zone Bit Recording on something else though, you know it is possible and that there are examples though.

Inconsistent Disk Formats ( Copy Protection )

I want to end this with another example that I wouldn't exactly recommend trying to decode yourself. This is Microsoft Adventure for the IBM PC. This is the first game for the PC, and it has DRM. More specifically it has copy protection. PCs originally shipped with disks formatted with 8 MFM sectors on a single single side. Microsoft Adventure though....uh...didn't. It was entirely possible to manually control the floppy drive on a PC to make custom and strange formats that only your program was designed to work with. This game is booted on the system directly to run, not loaded form DOS, so it has full hardware access. This wasn't the only game like this, Sublogic's Jet has a normal first track and then a weird 5 sector format after. You could define diskdefs for these disks, but this is a harder subject that also gets into defeating the checks built into the game's executables as well which is beyond the scope of this video. For disks like this, I would recommend sticking with just flux for preservation unless you really want a challenge.

Microsoft Adventure copy protection has sectors are unevenly sized, sector IDs in the 100's, and sectors aren't spread out consistently over the disk.

Fin

And that is finally everything I think you should know to get started with working with floppy disks at a low level for preservation purposes. When you are making archival copies of disks you should always keep and backup flux data because it could be used in different ways later. But it isn't the best for making new disks and working with other tools, so knowing how decoding works is very important as well. The way I work with disks now is to read flux data, use HxC to generate an image of the flux map, check it for the structure, and then use a dedicated script to both decode the flux to binary data and attempt to validate my final image by using other tools to extract files to test it in some way. It's been very easy to build up a robust and powerful set of tools using these amazing projects. I hope I've given you all of the information you need to do that as well.

I have also produced a video covering this topic if you would like to watch a presented version of this: https://youtu.be/UxsRpMdmlGo