On Fri, Nov 30, 2012 at 02:28:24PM +0000, Jonathan Maw wrote:
This script reads a built baserock system and finds all the files in
the
cache that are needed to replicate the build
---
scripts/find-artifacts | 123 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 123 insertions(+)
create mode 100755 scripts/find-artifacts
diff --git a/scripts/find-artifacts b/scripts/find-artifacts
new file mode 100755
index 0000000..a6f40a2
--- /dev/null
+++ b/scripts/find-artifacts
@@ -0,0 +1,123 @@
snip
+ def process_args(self, args):
+ artifacts_dir = os.path.join(self.settings['cachedir'],
'artifacts')
+
+ # args[0] is the path to the built image.
+ # Mount the image
+ mount_point = None
+ with MountableImage(self, args[0]) as mount_point:
+ # For each meta file:
+ metadir = os.path.join(mount_point, 'factory-run',
'baserock')
+ metaglob = os.path.join(metadir, '*.meta')
+ for metafile in glob.glob(metaglob):
glob.iglob is a bit better here, it returns an iterator rather than a
list. It uses less memory, but otherwise performs the same when used
here.
Not that it matters too much here, since it won't be a vast amount of
memory and memory isn't a scarce resource here, but it's a useful thing
to remember.
+ metafilepath = os.path.join(metadir, metafile)
+ # Read the file as JSON and extract the kind and cache-key
+ metajson = json.load(open(metafilepath))
+ cache_key = metajson['cache-key']
This leaves the file open until garbage collected, since the
opened file isn't referenced anyehere when json.load() exits.
This could potentially lead to a crash if there are a lot of metafiles
and you end up opening more files than your process is allowed to,
though I'll admit that it's unlikely to process that many files without
garbage collecting.
CPython's garbage collection is reference counted, so it's going to be
closed as soon as json.load() exits, but other python implementations
may garbage collect without reference counting, so it can happen at any
time after it is no longer referenced.
The point of this rant, is that in some rare occasions this will cause
the program to crash, but really I just want you to use the following
because I think it looks nicer, is guaranteed to close the file after
the with block, and it guarantees to close the file when you have an
exception.
with open(metafilepath) as metafile:
metajson = json.load(metafile)
+
+ # Grab every file in the artifact cache which matches the
+ # cache-key
+ found_artifact = False
+ artifact_glob = os.path.join(artifacts_dir, cache_key) + '*'
+ for cached_file in glob.glob(artifact_glob):
+ found_artifact = True
+ self.output.write(cached_file + "\n")
+ if found_artifact == False:
+ raise cliapp.AppException('Could not find cache-key '
+ + cache_key + ' for artifact '
+ + metajson['artifact-name'])
You could use iglob here too, but it may be nicer to eliminate the
found_artifact variable.
found_artifacts = glob.glob(artifact_glob)
if not found_artifacts:
raise cliapp.AppException(...)
for cached_file in found_artifacts:
self.output.write(...)
I don't have a preference for either approach, but the latter is one
less line and one less variable.